'DOCUMENT RESUME i- 



ED 241.797 

AUTHOR, 
TITLE 

institution 
'pub date 

NOTE, 
PUB TYPE 

EDRS PRIC^ 
DESCKI'PTpRS 



IDENTIFIERS 
ABSTRACT / 



CB. ^038 660 

Burnside, Billy L, 

Subjective Appraisal as a Feedback Tool, Technical 
Seport 604, 

Army ^Research Inst, for the Behavioral a^d Social* 
Sciences, Alexandria, Va, ^ ^ 

May &2' - - - \ 

44p',; Developed at J^I Field Unit, Fort Knox, K7, 
Reports - Research/Technical (143) 

MF01/PC02 Plus Postage, 

Evaluation; Evaluation Methods; Feedback; Informal 
Assessment; * Interviews; *Job Performance ; ^Mil^t^Wy 
.Personnel; Military draining; Peer ^valuatiofn;if 
^Peia^onnel Evaluation,^elf Evaluation (individuals);' 
^Surveys ' * ' • 

♦Accuracy; *Sub5ective^ Evaluatioa^ f 



line^^h 



This report examineT^> the accuracy^of subjective 
apj^raisals of several aspects of task performance, including 
proficiency^ difficulty, frequencj^, and criticality. An introduction 
discusses^ current Army ^use of subjective appraisal, feedback pethods, 
«and pro^blems with subj'ective appraisal, iJata pertaining to the 
accuracy 'of various types of appraisal are summarize^ in. the next 
section. The types of appraisal included are proficiency appraisals, 
task criticality appraisals, ta^sk difficulty appraisals, -tasHA*,^ 
frequency estimates, and appraisal of training materials* At the end 
of the section, research from the cognitive*psychology literature 
relating to human ability to make accurate subjjpctive appraisals is 
dispussed. The third 'Section^ summarizes dat^ relating to the relative 
accuracy pf appraisals obtained f^qp alternative sources, namely 
supervisor, ser£-, and peer appraisals. In the fourth section, 
discussion of the issue 6f, how subjective appraisals are formulated 
* centers around survey and iifterview techniques. Methods for 

increasing the eicct^acy of subjective appraifSals are made, fojcusing ^ 
on .phrasing of questions, raters' experience, and ^other 
characteristics of ra^rers. The ^paper concludes with suggestions for 
optimizing combihed use of the survey and interview approaches, 
(VLB) * 'I 



******************** t ************************************ 

* Reproductions supplied l^y EDRS are tihe best that can be made ' * 

* from the' original document, * 

******************************************* ******************^********fr 



ERLC 



Technical Report 604 

■ * * 

S 

¥ 



SUBJEQTIVE APPRAISAL AS A 
- FEEDBACK TOOL 



Billy L, Surnsidje 



/ 



^ ARI FIELD UNIT AT FORT KNOX, KENTUCKY 




U. S. Army 
Research Institute for the Behayft>ral and 



MS. DEPARTMEMT OP EDUCATtOHl \ 

HATtotfAi msnnm of education 

EOUCATIOHAL flESOOfidtS WtO^AKOW 

/ C£F*TEn lEftigj 

MflOf Chtn^n h*w bw rn*d« (a tmprove 



i facial 



Sciences 



r 



Way 1982 



Approved forrpubtk rttui*; dlitributlon unftmitftd. 



U.S.. ARMY RESEARCH JNSJITUT.E • . 

FOR THE BEHAVIORAL AND SOCIAL SCIENCES 

^ - . . 

A Field Operating. Agency under the Jurisdiction of the 
Deputy Chief of .Staff- for Personnel 



' L. NEALE COSBY 

JOSEPH ZEIDNER . . ^ ' , Colpnei, IN 

.Technical ESrector J ' , ' ' ■.Commander 

*- ' ' ' 
I ' U i. ■ ■ 



J 



V 



jNOTtCfS ■ ^ ■ / 

0 1 STR I BUT ION : Prln^ry dlstrlb^ution of this report has' beon made by ARI . 
PloasQ address cq^rro spondence concern I d 1 str i but i chi of roports to; U.S. 
Army Research I nst i tut e for tho Behavroral 9nd Soc'ta I Sc i ©nces, ATTN: 
PERI-JST, 500*T Ersenhower Avonuo , . A I exa'nd r t'a , Vtrglnta 22333* 

"F \ HhL PI SPOS I T ION : ' " Th f s repo;rt may be destroyed when it is no I onger 
deeded. Please do not rj^turn It' to the U*S. Army Research Institute for 
the SehJvloral and Soclol Sciences. . '^^ * * * 

NOTE". * The frftdfngs tn This report are *oT to be. constrtied as an'O-ffrdat ,^ 
department of the Ar«y * pos 1 1 1 on , un J.>s s so designated by other aothortred. * 
]^ocun{ents. 



ERIC 



3 



UHCIASSIFIED 



SECURITY CL>A£StPtC4TIOH OF THIS PACE (Wittn Datt Bnttttd} 



REPORT DOCUMENTATlbN-PAGE , 


feEAD INSTRUCTIONS ^ 
BEFORE COMPLETING FORM 


1. REPORT HUMBeR ' ^ C6VT ACC ESSlOM NO, 

Technical Report 604 - ^ 


3, REC^Pipt|T*9 CATALOG NUMBER « 


4. TtTL^ ("and Submit) ' ' ' 

SUBJECTIVE APPRAISAL AS A FEEDBACK tOOL ^ 


1 1 

S, TYPE 0F,'REP0RT & PERIOO COVERED 

I^lterim ^Rjeport 

October 1981 May- 1982 


6« PERFORMING ORO.^^EPORT NUMBER^ 



7* AUTHOR<0 

Billy Lt Bunislde. (AJII), ' , ' - ' 


S. CONTRACT OR ORANT NiiMBERl'tJ 

, ► V Arf 

, ^ *, 

,« 

* t ' - 


9* PERFORMIHO OROAHtZJ^TIOH i4A»4£>HO ADDRESS , ^ 

US Army Resjaarch In^t'ltute for the Behavioral 
and Social Sciences, 5001 Elsenhower Avenue 
Al^andrla, VA ,22333 . . ' * 


10, PROGRAM EU£mENX PROJECT, TASK 
AREA d WORK U^IlT NUMBERS 

2Q2637^A794. ^ . ' 


It, COHTRO^UHG (Office NAME An^O AO^OPESS ' 1 

us Array Research Institute for the Behavioral 
Social Sciences, 5001 Eisenhower Avenue 
Alexandria, VA 22333 f - 


U, REF&RT^DATS 
ila/ 1982 ^ 


NUMBER OF PAGES w 


MOHlTOftlHO AGENCY HAMS ft AD0Rj^S{Jf <liU4ftnt fnm CMttotllttg OtUc*} 

— : ' ^ ^ 


14. SECURITY CLASV^of »f»rtj 

UNCLASSIFIED / 


15*. oeclassipication/oownoradino 

^CNCOULE 


tS. 0|STRieUTtOf4 STAT|MEHT far R«por«> 

Approved for, public release} distribution unjliisited* 
\ — ^ 


17. OlSTRieUTlOH STATEMENT faf Ch« tbttftQj mi^fdln SJocJt 20, ff difftvwtii (rt>m AtportJ * 

, [ ^ ' \ ' . ' 

* - ^ J [ ^ lJ 


1** SUf^PLEMCHTARY MOTES . . ^ * 


Training l^eedback ^ - " , Training Evaluation 
Training Effectiveness Hqmaa' Performance ' , ^ 
Training Management' ' ^ Instructional Design 
Training Development ■ - ' - * ^ 


20l abstract fOxTtfiu «ei'r«iw«« n«c*«««(/ «^ ftf«nfJ/r br.bla^Jc ntmtb«0 ' 

' This report exaiiln^s the accyralcj o£ subjective appraisals of. several 
aspects of task perfonaance;^includlng prof^^clency, difficulty, frequency, and 
critlcality* The rela^t'lve ac^curacy*6f subjective appraisals collected from 
various sources by various methods. Is dfscu^sed, and , suggestions are developed 
for ways to increase, the accuracy of the'se appspalsals. The use of subjective 
,^data In an integrated feedback system Is addressed, and suggestions for further 
"research are offere<i. Findings 'should be of ihtltrest to tr^alning <leveioper^ 
arid evaluators* ■ ^ ' - ' ^ \ ' ' • ^ j ' 



CDtTIOM or f h6v IS OS^Ui^TC 



48 



/,UNCUSSIFIED ' 



SECURITY CUASStnCATJOK Of THIS f^AGC f*W*ft DaM Entwd) 

'4, •. ^ ■ .■ • , -•■ 



TecKnicat Report 604 



SUBJECTIVE APPRAISAL AS A 
FEEDBACK. TOOL 



Billy L Btj'rnslde 

■ ' ■ J 



Submitted by^' 
Donald F,, Haggard, Chief , 
ARI FIELD UNIT AT FORT KNOX. KENTUCKY 



Approved by: . , 

Harold F.O'Neif. Jr„ Director 
TRAINING RESEARCH UABCRATORY 



U*S. ARMY RESEARCH INSTITUTE FOR/T^E BEHAVIORAL AND SOCIAL SCIENCES 

soot Etje'nJiower Avenue^ Alexandria, Virginia 22333 
■■ ^ * 

Office, Deputy Chief of Staff for Penonnel 

Department of the^ Army - 

May 1982 



Army Project Number Education and Training 

2d26374A794 \ ^ 



V 



A^^wd for publtc r*f«*f«; dttirlbutio^ uolMt«<i. 

a ■ ill 



Research Reports arid Techhical Reports are intended far sponsors of 
R,&D tasks artd for other research 'and ni*titary agencies. Any findings" ready 
for implefnentatipn ax the time of publication are presented in the last part 
of the Brief, Upoh completion of aifnajor phase of the task, forma! recom- 
mendation^'fpr official action noi'maify are convey^ to appropriate military 
agencies by briefing or Disposition Rorm. * * , " ^ 



"The Fort Knox Field Unit has long been involvied in the 'application of ex^ 
perimental psychology to increasing che quality of the. products of Ariny 
Centers/^Sohools* ^These products include trained soid^ers and training ma^ 
terialfej The tra^jfiing evaluation and feejiback team of this unit performs re- 
^eargh and development on increasing the .quality of, these products by improving, 
t*ie infonnation flow'^^tween training developers and users in t*he field. 

Iti ARI Research Report 1323 '{Birrnside, 1981), it was determined that the 
principal med:hods' currently, used to .provide feedback from" field personnel to 
■training developers involve the collection of subjective data* Such data in- 
volv^ individuals' judgments ot estimations, which may or may not Ke .objectively 
Verified in particular instances. This approach is a cost-effective one, but 
the acctnracy of the data invplved is/a matcher for concern. This issue of the 
accuracy of subjective data must be resoled before an Integrated feedback sys- 
tem can be designed. Decisions must be made as to. when subjective data can be 
relied upon and vh^n more objective but costly methods must be^applied. 

This report provides background for the integration of -subjective and ob- 
jective feedback biethods by examining the accuracy of subjective data in $ 
variefy of settings. Findings intiicace that such data are frequently not ac- 
curate and ^houl'd be usedr cautiOu^y. ^ Included in the report are suggestions 
for wa^s tb^increas^ the accuracy of ' subject j.ye data, and these have Implica-' 
t ions' for. tRAljOC arid, other Army personnel Qoncerh^ with the evaluation of ' 
traihing and jjFhe'fl^w of information between training developers and users.* 

A wide range of data is suimnarized in this report. Significant assistance 
in locating many of these dat^ was provided by the peer reviewers, Dr. Jack 
Hiller of the Presidio of Monterey Field Unit and Dr. Joel Schendel^^S^the Fort 
Beonlng Field Unit. .They also provided many useful comments which have been 
incorporated into the report. Acknowledgement is also extended to Dr'. Stephen ■* 
Goldberg of the Fqrt Knox Field Unit* for^ th6 provisior^^of unpublished *data used 
in this rfport. * f , . ^ ^ - . ^ 

\ - 1 / JOSEPH 2 _ , 

'C_I<chnical Directcir 
■ " ■ u ^ 




1 ■ ■ . ' — 



SUBJECTIVE APPRAISAL AS A FEEDBACK TpOL 

' EXECUTIVE Nummary , - ' , - *) 

■ — 

\ , ... 
, « Requlrea^nt ' - 

Feedback from fleld^unlts to US Army Training and Doctrine CommanJ (TRADOC) 
^ Centers/Schools currently consists largely of subjective data^ or Information 

which toay be Influenced by individuals* opinions or inferencest In this report 
the accuracy of such dat^ is e6caialned| in'order to determine th^ir utility as a 
feedback toQlt " * " ' ^ * 

Procedure: , ' 

Rele^j^t previously published and unpublished data are reviewed from a 
) variety of sources, Including military research, educational research, and 
cognitive psychology* These da^a are (Organized to address the accuracy of 
subjective appraisals of individuals* jftrof icllenclds on specific tasks, as well 
as the task performance frequency, difficulty, and cri^icalityt Other issues 
addressed are tjie relative accuracy of various scjkrces of subjective appraisals ^ 
^ (selffc supervisors, and peers) arid the relative accuracy of .various appraisal 
methods (survey and interview techniques) ^ 

Findings: ' ^' , 

Subjective appraisals of i^arious aspects of task performance have been 
found to be accurate in some instances* But, in general, accuracy of subjec- 
tive appraisals has not been reported consistently enough to support their 
^ widespread use ss feedback without further accuracy chedks* Tt^e relative ac- 
curacy of various subjective appraisal sources and methods has also not been 
fully determinedt Varidbs proposals for further research and for ways in 
which the accuracy of subjective appraisals may be increased are Included in 

the report * I ^ ■ 

* ' ' ' . u 

Utilization of bindings: ' ^ „ 

This rebort will be use:£}il to training developers and evaluators* to assist 
them in obtpjlnlng meanlngfu^^ feedback on varioas aspects of task performance 
from the ^jfcld^ It will also be useful- in guiding development of an, integrated 
feedback system and in guiding research On design of co^t-ef f ective and accu- 
rate feedback tools* . - , 



vll 

i 



o , 8 

ERIC : • . , ^ 



SUBJECTIVE. APPRAISAL AS A FEEDBACK TOOl 
CONTENTS 



INTRODUCTION 



i 



Current Army Use of Subjective Appraisal 

'Feedback Methods -^^ 

Pi;oblems with Subjective Appraisal * . . 
Report Organizatioa : . . 

TYPES OF APPRAISALS * ^ 

Proficiency Appraisals : . 

Task Criticality Appraisals 

Task Diffi?culty Appraisals 

Task Frequency Estimates + + * 

Appraisal of Training Materials \ . ^ . 

Tentative Conclusions 

Cognitive Psychology . / 

nPES OF APPRAISERS , . . . . \ 

Relative Accuracy of Self-Appraisals . . 

Peer Appraisals 

Tentative Conclusions* . . . . i ^ . 

TYPBS OF APPRAISAL METHODS 

Surveys and Interview^- \ 

Phrasing q£ Questions 

Raters' Expe^rienc^ 

Other Characteristics of Raters . T . . 

COHGLUSIONS/RECOMMEHDATIOHS 

REFERENCE NOTES 

REFERENCES ^ 



I 



INTRODUCTIOK ^ 

The'purpose of this paper is to det^mine the accuracy and utility of a 
particular evaluation method, subjective appraisal. Appraisal here refers to 
the evaluation of the performance of individual soldier^ and military units on 
Sf^ecific tasks in^a field setting. This is distinguished froili assessment, 
which involves a general evaluatidn of individuals' personal characteristics, 
knowledge, and al)ilities, suc?^ as the evaluation of leadership abilities in* an 
assessment Center (Levine, 1980). The present paper is primarily concerned 
wltfi appraisal pf task-specific job performance, and not with more general as- 
sessment 'issues. 

The terms "subjective" and "objective" wi^l be used'frequently throughout 
this paper, ^d they have numerous connotations. It is thus necessary to de** 
fine their meanings carefully in the'present context. Dictionary definitions 
of "subjective" include ''illusory" and "existing only within the experiencer's 
min4^and incapable of external verification." Such negative connotationsnare 
not intended here. Rather, subjective appraisal is defined as that which" is 
based upon individuals* judgments or estimations, and which ca^ be but is not 
always externally verified. Subjective appraisals are usually obtained through 
the use .of surveys or interviews' in terns of some sort of rating scale. In 
contrast, dictionary definitions of ^objective" include "having to do with ma- 
terial objects, actual existence, or observable phenomena" and "uninfXuenced 
by erotion oj personal prejudice." Objective appraisal thus involves the 
actual observati6n of performance and collection of , performance data; i.e., 
verification External te individuals' opinions or estimations.' For example", 
one co^ld simply ask a soldier whether he or she can perform a specific t,ask; 
this is what is meant by subjective appraisal here. Or one could adm^iffster a 
hands-on test, observe the soldier*s performance, and compare it against a 
validated standard; this is what is nieant by obje<^ive appraisal. The distinc- 
tion is analogous to that frequently made between "soft**'and "hard" data, with 
"soft" dati consisting largely of individuals'* opinions and intuitive judgments 
and "l^rd" da^^a consisting of performance results in a controlled situation. 

.Objective appraisal (or "hard" data) provides in some sense the, truest evalua- 
t;iop, since 'it is observable ajctd exteuially verified. But subj'^tive app^isal 
(or "soft" data) is the more efficient and cost-effective method. Xn some real 
world situations, objective appraisal may be so costly and time^-consuming as 
to be practically imp)pss<ble. A key question then becom&^ that of whet;hex d^ta 

^gathered during subjective appraidfrl are sufficiently accurate to warrant their 
u;se in particul^g: situations. VrLs is a primary issue in the present paper. 

In actuality, the distinction between subjective and objective appraisal 
*is tiot as clear-^cut as^'might have been implied above. Appraisal is perhaps 
1)est described as a. dimension with subjectivity at one end and objectivity at 
the other. The difference between subjective and objective appraisal methods 
is thus one of 4^&^^3i with real-world methods 'representing various mixes. 
'Survey^ can be made more objective by asking well-specified factual questions 
, and }^ using behavlorally anchored rating scal>es (Cascio, 1978). Perfomtance 
^observation can be made less objective by using written knowledge tests or 
: simulated performance in lieu of actual ^'handsion" performance, or by using 



/ 



10 



observational criteria or standards which require judgments or inferences to 
be made^ One could enter into protracted philosophical arguments about the 
distinction between subjective and objective appraisal^ all subjective opinion 
is based upon experience to some extent, and all objective performance observa- 
tion and" testing involves judgiAent to some %xtent*. Such arguments will be 
avoided here in the interest of practicality* For practical purposes, the key 
question is not how the methods differ in a theoretical sense or Uiether one' 
method is better than the other in'an absolute sense, but rather what the apj^ 
propriatemix of methods is for a given-sitMation. 

- % . 

Current Amy Use of Subjective Appraisal 

The use of subjective appraisal and assessment methods is ubiquitous in 
the Amy* The' career performance of individuals is periodically assessed with 
efficiency reports which utilize subjective rating scales and narrative com- ' 
jnents^ The .readiness of units Is periT>dically assessed^ u^ing a Unit Status 
Report- (AR which requires subjective estimates on the ^rt of the unit ^ 

commander (Heymont, 1977), The collective performance of uni/Cs on specific 
exercises, ^uch as Table IX for tanl^ platocms and Army Training and /Evaluation 
Program (ARTEP) missions^ Is largely^appraised subjectively because the com- 
jplexity of the performance would make objective, appraisal highly resource-^ 
intensive* .Task analysed a'nd front-en<r analyses for new training programs are 
frequently based upon ,subj ec tive appra^sals^ For example, subjective estimates 
of^the criticality and performance frequency of specific tasks may be obtained, 
by administering Comprehensive Occupational Data Analysis Program (CODA?) sur- 
veys to field personnel* Problem-solving techniques used in the Army,' such as 
the estimate of the situation (FM 101-5), also frequently require the subjec- 
tive ippraisal of specific situations and courses *of action* Th^ us^ of sub- 
jective applraisal is so widespread in the Army that it has in some respects 
been. canonized, is commonly referred to as "military judgment," and is some-' 
tlnies espoused by senior Army personnel as^ the onl^ approach for aii^lyzing 
complex situations (West, Note 1)+ ' \ 

The scope of this paper does not' allow a review ot subjective judgment in 
the Army in all its manifestations* Rather, Che^use of subjective appraisal 
will be e3camined in a specific context or situation, the feedback of informa-^ * 
tion from rf ield units to Centers/Schools* The products of Trjainiug^and* DSc** 
trine Command (TftADOC) Centers/Schools can be grouped into two 'categories : 
graduates and training doctrine, guidance, or materials* ^ Ip order appraise 
the quality and utility 6f these products, elements , of the Cent^^s/Scliools need 
m^ningful feedback from user^ in the field* This constitutes.^the evaluation 
phase of the lastructional. Systems Development (ISD> process 'described in 
TRADOC Pam 350-30 and further delineated in draft TRAI>OC Regulation 350-7* 
Elements collecting feedback fr9m users may include Directorates of Evaluation 
(doe's), task analysts, training developers, and special offices (e*g*, the 
Office of Armor Force Management and Standardization (OAPllS) ^at Fort Knox^ KY) * 
A preliminary review indicated that the primary methods which such element^ 
currently 6se to gather feedback frequently include the use>of subjective ap- 
praisals (Bumside, 1981),* « ' i * * 



^ 

11' 



t 



Feedback' Methods 



There are six {>rirtciple methods which Centers/Schools may lUse to obtain 
feedback frpm field units: receipt of informal comments, administration of 
surveys/questionnaires, conduct of interviews, analysis of existing unit per- 
formance records, pbservation of field performance, and operational field per- 
formance testing* The first three of these methods, which definitely involve 
subjective appraj^al, are the most frequently used ^according to battalion com- 
manders and staffs (Burnside, 1981)^ The^ last two yieithods are more objective 
in nature, but are not commonly used because* of their costs* ^e sixth method, 
analysis of existing records, may best be described las a mix of subjective and 
objective app^isal, but it was found to be of limited utility b&cause*of the 
limited avail|i)jility, standardisation, and^^pecif icity of many records* 
Bumslde (1981) reviewed the general parameters and usage of available feedl>ack 
methods* The present paper provides further analysis of the accuracy of, the 
Hiost popular of these methods; i.e*, those involving subj^ctx^ appraisal. 

Problems with -Sub.i^fective Appraisal 

What are the general problems wh^ch may ari^^ from the use of subjective 
.appraisal? Reviewers of the subjective judgment literature (e^g+, Cascio 
(1978), Hol2^>ach (1978), an<i Thornton ^1980)) have f^^insistently described ^ 
several types of psycbonfetric errors or problems which commonly occur* ' Promi- 
nent among trtiese ar^ leniency errors,* central tendency errors, halo effects^^^ 
and lack of interrater reliability* Leniency errors occur when raters avoid 
using the low extremes of a rating sdalhy leading to a ^restricted range or' re- 
duced variance of j^atings* This tendency may represent a systematic bias on 
the £ar£ of raters tjD avoid giving fatings which can he interpreted negatively* 
The occurrence of leniency errors among Army raters is exemplified b^y past 
^ distributions of officer efficienoy ratings, in which only the top 'f6w points 
\.of a 100*point scale have been used» Similar tp leniency etrors are central * 
tendency errors^ which Represent a tendency of raters to. aVoid using both ftre^ 
higb and low extremes ot rating scales* ^f there' is no* systematic bias against 
negative ratings, there may still be a .bias against extreme ratings and a 
tendency for responses or judgments tQ cluster arohnd the middle'of "the scale* 
Thus,* everything is rated about average, and the vari^ance of ratings is again 
reduced* , ' i 

The halo effect occurs when a rater faij^^o- distinguish among the differ^ *^ 
ent dimensions of a sit;,uation and applies a global o^^over^ll judgment based 
on one salient dimension* The^^atirtgs of different aspect,s; of a situation,' 
then tend, to agree or correlate highly, whethe5r,t4i[;t;s is appropriate or not* 
For example', if a supervisor is asked to r^te ^p^perfcrtfinance of a soldier on 
specific tasks,' he or she , may loake the globaJW^/^ent that Che soldier is a 
gqod worker and rate him or her hjfeh on all'rSsks^ even though performanee of ' 
some of them may never have been observed^ ,Such a rating tendency detracts 
from the ability to discriminate betw^h diffetent aspects of perftrmance* 

■ * ' X ' . ' ' 

The lack- of interrater rellabj^lity sdmply meyans that different raters do ^ 
not agree in their, jud^ent^* Vithout reliability, ratings are practically 
useles;s; reliability s^ts^ the limit on the degree of validity which can be 



obtained (Kitchell, 1979) > For example, if a group of subject matter expeiTts 
do not agr^ee^on ratings of- task criticality, then "truly" critical tasks Qannot 
be Identified^ Of course, agreement among rater* does not guarantee accuracy 
of ratings (Frlck and Semmel, 1978). Raters can all .agree and" all be wrong* 
So Interrat^r reliability is a neces^ry but not sufficient prerequisite to 
obtaining valid ratings* , . * ^ - * ^ _ 

One effect of the ^oblems bri&fly descri^bed above is to reduce the amount 
of correlation or agreement between subjective ratl.ng% and more objective cri- 
teria* 'Far example, a tendency which reduces ^tbe variance'of rating$ generally 
reduces the degree to which they correlate with other measures* These and 
other problems with the use of subjective ratlings In feedbac4t will be"" further 
discussed in the context of specific samp 1^ data telow, ,Approaches for eXimi- 
natlng oV reducing^ rater biaSL,wilU>e add^sWd in the final section of the 
paper. y < * ^ * 

Report OrganjUatlon * ' y m 

There are numerous dimensions or sets of issues which could be used to ^ 
organize dis,cussion of the area af sub^ecLive feedback* The organization used 
in this report will center around tHe Issues of what is being apprals^, who 
is doing the appraising, and how the appraisal Is bein^ doneV The 'type of ap-, 
pralsal of greatest interest here Involves estimates of soldiers* proficiencies* 
on specific taskst^ ^ut other types of appraisals are of Interest to TRADOC 
Centers/Schools^, at least during front-end analysis, and the^ include judg-' 
ments of the criticality, difficulty, and perfojrmance frequency of specific. ^ 
tastst I)ata pertainiftg to the accuracy of all these types of appng^isals are 
summarized In the next section* With regard to the Issue of who perfo3rms sub- 
jective appraisals, the most common approaches In the feedback arena' are self* 
^appraisal and "Appraisal by supervisors. Another approach which is not as 
commoi(j but ^y have application as a feedback methodology Is appraisal by peer 
group members* Data collected from different appraisers will be compared In 
the second section. Discussion of the Issue of how st/bjeetive appraisals are 
done will center around survey and interview techniques, aifd this paper will 
conclude with suggestions for optimizing combined us|e of these approaches* 

I ^ ' TYPES OF APPRAISALS' ^ ' t . 

As outlined above^. the types of appraisal of Ipterast be^^e, ii^ terms pf 
what is being appraised, include estimates of task^proficlency, criticality, 
difficulty, and performance freqi^ncy^ The data suninairlzed belo.w are relevant 
to the accuracy of 3Uj;:;Ji est^iQ^tes/^nd 5iere sa;tectad in 'accordance w4^th two 
Criteria; they were obtained for spj^clf l^c, mllit>try tasks or, tasksr^iml3LaV^t(j 
those performed in the military, aTr4,*t;hejf were *compar§d to more p/j^e^tiVe data, 
obtained In the same stu^y/, t^n many^ cas^ in the^ llteratiire, .theraccuracy ;f;' 
subjective rjatlngs has been assessed Vy coiaparlng them tp othw. ra.tings. Such 
studies are de-emphaslzed T^reJ In fayor of those employing independent, objec- 
tive criteria* At the '^end of this^^^ectlttn,- research^^T^oBj the cognit^ive ^sy- ' 
chology literature* relating to^^humans', ability to iiake\accurate subject'lve., 
appraisals ,ls'iled—tiii.:as appropriate t . ^ ■ ' - ^t' . . ' ^ 



■ ■ " ■ 

Proficiency Appraisalg M 

A key element of feedbaok from^field units ^to TRADOC Centers/Schools^^is^ 
data relating to the proficiency with Which solcfiers *can perform specific re- 
quired tasks. «Such daka ^are needed to'^illov elements of Centers/Schools , to 
evaluate both institutitonal training and unit training anS to make modifica- 
tions as needed. Sincfe the 'operatiojial testing o£ soldj^ers* performlince in 
the field is costly in' terms of time and resources, proficiency data are usu- 
ally gathered^ through subjective estimates. That is, soldier^ are asked to 
estimate their confidence or the likelihood that^ they can pei:fq;i^ specific 
tasks. Supervisor's may also be asked to 4:ate soldiers' proMciencies^ How 
accurately do such subjective appraisals reflect act^ial'task proficiencies? 
The relevant data sunmiarized below provide a mixed answer. 

Pourchot and Lannlng (1979) found that subjective proficiency estimates 
correlate highly with performance test results in certain instances'. Over 200 
subjects rated their ability to use hand tools, a task of ^high relevance to 
military jobs. These predictions. correlated significantly with scores on a 
performance oriented maintenance test. The authors concluded that the accuracy 
of the performadce appraisals was due to the explicitnes^ of tasks iifvolving 
hand tools. ThiLs suggests that subjective proficiency appraisals can provide 
ai^cur^ate perfot^ance feedback if the tasks rate4 are made sufficiently explicit. 

'Another ta^k of some relevance tq the military for which the accuracy of 
subjective app^raisats has been examined is clerical and typing abilities. 
Levine, Flory,' and Ash (1977) found significant positive correlations between 
subjects* ratings of their abilities and written test iscores in areas such as 
.spelling, grammar, reading, and arithmetic. They also found that self-ratings 
of typing speed correlated at. the *60 or higher level with results of a stan- 
dardized typing test. Ash (1980) further examined the accuracy of self^ ^ 
apprai^al^ of typing ability and found that such ratings correlate mode^tel^ 
well €ith typing Eest scores. ^With a sample of over 150 high' school students, ' 
self-ratings ofi straight^copy typing^ ability correlated in the .44 to ;59xange* 
with typing tests for; alphabetic material, but l^s than «.30^wXth t^sts for 
numeric and tabular material. There was also a lack of discriminant validity 
in this study. Tl\^t is, .self-estimates of straight copy net words per minute 
correlated highly with test results for typing of straight copy, letters, and 
revised manuscripts, but self-^estimates Of ability to .type letters, manuscri pts. 
^ and numbers did not correlate highly with corresponding test results.' Subjects 
thua demonstrated an_ability to accurately appraise their basic straight copy 
typing spee^jggnd accuracy, but they did not accurately appraise more advanced 
typing abilities with which they h^d less experience. A leniency error was 
also found in ^his study, ^nce the mean 'straight copy self-appraisal score was 
* approximately 12 net words per minute higher than the mean straight copy test 
score. A final. finding of interest was that minority gVoup members* self-' 
appraisals of sk^aight copy typing abilities were less accurate predictors of 
test scores* t)>an' were majority group members* appraisals. The primary con- 
dusipn to te drawn from these clerical and typing studies is that subjects can 
appraise their own abilities with moderate accuracy^ as long as the tasks ap- 
praised ai;e basic 'ones with which the slibjects have had extensive experience. 
Secondary conclusions are that leniency errors fnay (5ccur with such appraisals, 
^n^ that minority group nCembers may $pt>raise their abilities .less accurately. 



■ ? 

ERIC 



r 



V 



within the field of education a large bodyof Research lias beep reported^ 
which relates to the accuracy of subjective appr**ii^als of proficiency* Much 
of this research tias limited relevance to 'the present review, since it addresses 
appraisals of general .knowledge ol^talned In a classroom ratr^er thaq appraisals 
of task*^eclflc pertbrmance abilities, The^p^oblem o£ obtaining an objective 
criterld^to compare subjective appraisals agai-nst is exacerbated when one is 
addressing general cognitiyje abilitl-es rather than , "hands-on'* or motor ablli-' , 
ties. But despite this criterion pi^oblem, educatloftal research has provided 
some findings of relevance in a idilltary context, particularly since mdch mill-* 
tary training la conducted In a classroom and military skills are, becoming more 
cognitively oriented. Thus, educational research on subjective evaluation or 
appraisal^ Is selectively reviewed .below. * ^ 

• 

Numerous studies have shown that at least, some students can accurately 
self**appralse the£r course performance. Moreland, Mlll^, and Laucfta, (19&1) 
found that good students were accurate In their self -^appraisals, but poor stu**.^ 
dents were relatively Inaccurate. The poor studen^^ 'understood the course 
grafting criteria, but for some reason tt^ey failed to accurately apply*these^ 
criteria tq their own course work. 'Shaughnessy (1979) found a similar result 
by^obtaitiing confidencj^ judgments along with answers to multiple-cholce^ques- . 
tions. Confidence judgmente were found Go be moderately accurate, and there 
was a strong positive relationship between confidence judgment accuracy and 
test performance. Students*who knew, an 'answer knew that they'ktiew. . Cohen 
(1981) reviewed the results of 14 studies In this area and found bhat the mean 
correlation between '^elf-appraisals and student achievement on tests was •A?. 
Studehts are at least moderately accurate In appraising their {Performance on 
written tests* and good'^stjudents are relatively more accurate than poor students 

There is scsae evidence ^that teacher^ are not as accurate in subjectively 
appraising classroom activities a^ students are. Hook and Rosenshlne (1979) ^ 
found that teachers* perceptions of ^classroom actlvlt were inaccurate com- 
pared with perceptions of students and outside observers^' For example, - 
teachers were found to be- Inaccyr^te In appraising the amounts of recitation', 
discussion, and question answering that occurred in their classes. Teachers* ^ 
global^ratings of classroom activity w^e t&nnd* to be moderately accurate com-* 
pared with observera* ratings, but, teachers* appraisals of specific activities 
were found to be inaccurate.. Hook and Rosenshine (i579) concluded that 
teachers* appraisals of specific classroom activities should n6t be assumed td* 
correspond to actual practice. Shavelson ,and Dempsey^Atwood S4976)' reached a 
similar conclusion in a'revi^ of the relationships between teacher . behavior 
.and student outcome measures. Measures of teattier behavior ,7^EMJuding subjec-* 
t^ve appraisals, were found tcT be .unstable and Inaccurate, with ilobal ratings 
showing the most stability. The appropriate overall conclus^n irom this line 
of research is that teacher^^ appraisals q£ their specific cia^^oom profi- 
^ciencies do nQt agree with outside observers* appraisals. Whether one^con- 
cl^des that teachers* are Inaccurate or observers ^re Inaccurate (ot both), 
this research provides evidence of the Inaccujracy of Sibjective appraisals.* 

' Cohen (1981) performed ^ meta**analysis of studies of the relationship be- 
tween student ratii^gs-jof instruction ^nd. student achievement and found stronger 
suppore for. the accuracy of student ratings than had previously been published. 



the av^age correlation between ^v^fali course rating^ by ;student5 and student 
achievement' on written tests wa^ -47, gnd the average correlation between ^ * 

^Wtings'^f instructors and achievement was ,43, This a^&ln supf^rts the accu-, 
rac^ oJr global ratings,' althoughoratj^ngs of sometSiait i£ore specif Ic* a^eas Such 
as Instructors* ^klll and course organization were also'^lqund to be accu^dte. ' 
"^Hfee general varla]jdes which Influenced the accuracy df,' Qswirse appraisals were " 
idelitlf^ed.^ Appraisals were. more* accurate for courses.'taught by experienced 
^PQstructors rather tKan graduate student;s, for couraes In which achievement ' 
t est V were not grad^ by. students^ own Instructors, and f6r courses In which 
students gave th&ir appraisals after they knew their final grades. The finding ' 
X)f inc'^pased lic^uracy with the^use of 'external graders "could be attributed to 
incons'istencl^s in grading practices among Instructors. Such Inconslst^cies 
^uid lessen the accuraci^ of subjective appraisals since they would result In 
ah unreliable criterion. The finding that students' appraisals are more accu- 
rate when they know th^lr, final grade may indicate that teachers can bqy good ' 
evaluations by glvlog godd grades. If students tend to evaluate what thejr have 
learned based on whet ^t£de they have achieved, then the accuracy of evaluations 
would be mo^e appropriately mea^red In situations where students do not know, 
tj^elr ^loal grades. In such studies the correlation between course appraisals 
and dchlevetaent was found to be .38, .Indicating at best moderate accuracy. - 

Cohen's 1^981) conclusion that studejrts' appraisals o^ Instruction are an 
acctjjrate index of students' proficiencies (I.e., what they learned from the 
course) must be tempered in several respects. Most of the appraisals addressed 
were global In nature and there are Indications that students use g^bal £ac*^ 
tors suci^*as the final gilade achieved or experted J,n evaluating a course or an. 
Ins t^ruct^r^^ Accuracy ot iglobal judgments may not be Indicative of accuracy In 
the type^'of task-speclfigj^Eerformance of lifterest In the present paper^. The 
cfiterloi) used In* studies of students* appraisals has most commonly been 
achievement ^n ^f^wrlt ten test. Results from such studies may or may not gen- 
eralize to-i^f^litary situations In whlcl^ the criterion Is manual performance of 
& task. ■ And, as poiitted out by cfoheri (1981), achievement j>n a retention test 
given at^a later tim^ may be a more valid criterion against which to compare 
subjective appra^als thaxr wlthin-course tests are. , . ' 

Educational research on the abllJLtles of students and teachers to accurately 
appraise t|ie£r course proficiencies has provided somewhat mixed results. But 
there are several Indications that; good students c^n accurately judge what they 
have learned, in at least a global sense. Further research Isyieeded to deterr 
mine ff this result generalizes to a military context.- Such research should 
address specific tasks'and use results of both immediate and delayed perform- 
ance tests as the criterion. 

. ^ •* ■ } . 

^ De^isf and Shaw (1977) nated that subjective proficiency apptaisals ad- 
dressed In previous 'educalS^onal and other research had genei^ally dealt 'with 
broadly defined or, global abilities^ They attempted to remedy this situation 
b^ examining the accuracy of self-appraisals for more specific abilities, such 
as visual pursuit, ntanual speed and accuracy, and ^patia^l orientation. College 
students" usedL. five-point scales to self-appraise thair abilities oti specific 
tastcs^-aiid were then test;^d on eac^ task using ability tests comntonly used In 
industrial^settings. ^Sample test Items were used to Insure that each student 



understood ^e specific abilities being appz«aised. Results shoved that while 
correlations be^yeen self -apptai^ed and tested abilities were almost all statist, 
tically signi£i<;ant, they j^ere too small to be of any practical significance. 
Tfiis finding demonstrates a problem with interpretation of studies of subjec** 
tive appraisal accuracy, i^any of which , involve correlational analyses; T^ile 
Det^si and Shaw (1977) considered correlations in the ,20 to .40 £^ge to be of 
li^ttle practical significance, 'other researchers interpret such correlations 
as Indicating at least moderate accuracy pi subje%:tive appraisals ^^bhen, 1981). 
DeNjlsi and Shaw (1977^ supported ,theit interpretation by shoeing that the self- 
appraisals failed to differentiate between students who subsequently scored 
low or high on-aorresponding ability tests. That is, the p^^dicted test score 
for students rating themselves high in a given, ability was within the 95 percent 
confidence ^interval .established ^ar'ound tjie predicted score for students rating 
theoiselves relatively low (no one rated themselves below average, indicating a 
leniency Mas). The appropriate <:on(3j^usion to be drawn from this study is thus, 
that self- ^pr aisals are not^ sufficiently accurate to be substituted for tests 
of specific abilities^, The practical significance of correlations with a mag^ 
nitude of, approximately .40 is a matter for debate. In line with DeKisi and 
Shaw (1977), such correlations will not be interpreted in the present paper as 
strongly supporting the accuracy of subjective appraisals. 

The research reviewed thus fa^? in tiiid section has dealt with general 
'knowledge or basic skills whiiih were not appraised in a military setting.. In 
a study of inpre direcL relevance to the Anay, Gi/lbert and Downey (1978) looked 
3-t the correlation between 10 measures of perfonaance taken during Ranger 
training and criterion measures obtained for the same group of officers three 
years later^ Unfortunately, this study did not provide, a particularly useful 
evaluation of the accuracy of subjective appraisal, since both the original and 
subsequent sets of measures consisted largely' of ratings by peer& and superiors. 
Correlations between these two sets of ratings ranged from .11 to *35,'indi^ 
eating a lack of agreement over time, perhaps due to the use of two different 
sets, of raters (low interrater reliability). *A halo effect may also ha^^^ been 
present, as .ratings of individuals on 10 dimensions tended to be highly similar. 
The validity or accuracy of the ratings could not be determined due to the lack 
f an independent objective criterion, but the problems described above (low 
nterr^ter reliability ahd halo effect) and the fact that the components pf 
perforoai^ce and their relative contribution to proficiency changed with ex^ 
periehce would necessarily lljnit validity coeffici^ts. 

' ' In a study 'conducted for the US Navy, Hall, Denton, and Zajkowski (1978) 
used achl^evement on a job Jcnowledge test as a criterion for determining the 
accuracy of subjective appraisals of proficiency. During a structured inter- 
'viewj supe^isoi;s estlsaated the prof iciency^of 32 electricians and boiler tech** 
nicians on specific tasks. These estintates were <:om^ared to, the sailors* per^ 
-formanC|e on written tests, and correlations were found to be^ low and nonsignlfi^ 
cant. The authors 'concluded that interview ^nd written te^t methods did not ^ 
produce equivalent information about ^task proficiency* Ccttaparison of profit 
ciency estimates*w:^th '**hands**on" performance would have alloi^ed more definitive 
conclusions about^the accuracy o^^subjective appraisals. . 




' tn another study of direct relevance to th^ Array ,/HedXin and ThoiPpson ''^ 
(1980) attempt^Jd to determine the; major dimensions or/f^ctprs that jailitary* 
judges use in subjectively ap^r^ising A^TEP petformaricet^ A complex milti-^ 
dimensional analysis of rafcings based upon written narratives of ARTE? perform- 
ance itidica^ted that, mi^litary Judges use only one general rating dimension, in- 
dicating a possfSle hs^C( effects A general iiopression o^^unit perfonaancfe 
apparently is used- t;o ^v^lua^e the unit, and ^more specific factor^ are^used ' . 
only if no strong overall Impression is made. . Again, the accuracy of subjec- 
tive. ARTE? evaluations jcould not be determined due to the lack of an'ind^pen- 
dent objective'^^i^terion in this study, «but appraisals of specific aspects of 
unit ^ejjfirmance could ^lot .be expected to be accurate If^ they are based only 
upon general ^pr^^ssions* , ^ *' * ^ ^ 

Caution should be applied in generalizing from the results of^^tViis l^st 
study, sipce the appraisals' were based upon brief written narratives and not 
upon '.actual observation of field performance* But it and the previous studies 
do demonstrate some ^^portant points about many studies of subjective appraisal 
in a military setting* ^ In many cases "an objective criterion ia not' available 
to allow full determination of the accuracy of subjective judgments* Ratings ^ 
are of terf compared with other ratings* But problems such as low reliabilities 
and halo effects limit the' .accuracy that should be expected* The. tasks for 
which performance is sul)jectively appraised are also often oot vejr^ specific 
or explicit, again leading" one to expect low judgmental accuracy* Summarized 
below are stadies which avoid the^e limitations by addressing task-specific 
appraisals compared with pl^ective performance. measures* . - 

Schendel and Uagman . (in press) have Reported at least indirect evidence 
for the accuracy of task-specific subjective proficiency appraisalst Sol^ier^ 
were trained to assemble/disassemble the H60 machinegun and were then retention 
tested. and retrained seveijal weeks later* Before they were retention tested, 
soldiers were asked to 'estimate Hdw much refre^l^er training they would require 
to regain proficiency on the taskt These subjective estimates' were highly 
accurate* However, this rasult does not provide strong evidence for the accu- 
racy of 3ubject|;,ve proficiency appraisals, due to the fact that limited re- 
training was neededt An average of only two trials were required for retrain- 
ing, and soldiers knew from initial training experience that they would be shown 
the correct ptocedure if thfey made an error during the first trial* It is thus 
not surprising that ^oldiers were able to correctly estimatet that they could 
releam the task within tvo trials* The accuracy of refresher traijiing estl^ 
mates should be further ad4re3sed using tasks that require large nui^bers of 
retraining trials, 

Hiller (1980) developed algebraic models for determining the relative 



benefits (in terms of time 



saved or lost) of alternative pretesting procedures; 



i,e,, ways of determining whether a soldier needs training on a specific task. 
The alternative procedures analyzed included self-estimates of task proficiency, 
written'tests, and performance tests* While the originiul paper did not directly 
address the relative accuracy of these appraisal methods^ Hiller (Note 2) has 
provided data which allow comparison of self-estimates and performance test 
results for five specific tiasks* Two of .these tasks (organize and employ a 
tank hunter-killer team) ir[volve leadStship skills, two (encode/decode and 



authenticate messages with a KAL 1^ Coding Device) are .primarily cognitive in 
nature, and one (emplace/recover ^an M16A1 Anti-Yersonnel Mine) involves '*hands^ 
on^' motor skills* Sdlf-estimates. of proficiency, were highly .accurate for the 
two leadership task^^ nearly everyone, who said they could do each fask passed ^ 
the performance test, and everyone t^ho said they could not do^eracb task failed 
the performance test. But Oognltiye tasks showed' considerably less accuracy 
in self-^app^isals; only 46% pf those who said" they could authenticate a 
mess$g^^&uld actually do so» while 50% who felt they' Coiil<}*not do the task 
passed';the performance test. Corresponding results for encoding/decoding 
messages were '37% and 25%. Finally, accuracy of ^ self-estimates was especially 
low for "hands-on" skills; only 2.3% of soldier^ who said they could emplace 
and recover a mine- could actually do so, while 32% of those who s£l14 ^bey could 
not do this task were able to pass the performanife test. Sa the acTiuracy of 
subjective appraisal in this study depended upon^Mihe type of ta^k being ad-^ 
dressed** Why did this ocdfir? dhe possible reason is that the accuracy of 
subjective appraisal declines as the criterion with which it,'' is compared be- 
comes js^airj^ objective. Leadership skills are difficult. fco develop standards 
for anA objectively evaluate; the higj^v accuracy^ for self-appraisal of leader- 
ship^ skills described above ma-y have^ cesui tud -f i om the comparison of two sub- 
jective appraisals. * That is, the performance tests for the tUo leadership 
tasks may have been relatively objective in nature. The performance test , 
standards for the cognitive skills would be ezpected to be more objective, re- 
sulting in X^ss ' accuracy of subjectiv^e appraisals. And the test standards 
should be the isost objective for the *'hands-on^* task, which sliowed the least 
subjective appraisal accuracy. This interpretation of the results indicates 
that subjective self**appraisal of proficiency on specific t^sks is^not accurate 
whea cosnpared with an objective criterion. This conclusion is. admittedly based 
upon a small sample of tasks^ so* further relevant data are summarized below. 

^ \ 

Shields, Goldberg, and -Dressel (1979) examined the retention of 20 basic 
soldiering skills by admlttlstering performance tests to soldiers in the field. 
The tasks addressed included such basic skills as' first aid, challenge and 
password, donning the gas mask, and checking the fiel^ telephone. As a part 
of this study^ confidence ratings of proficiency (self-^appraisals) were ob- 
tained Using a four-point scale for each task before it was tested* While the 
report ^refeifenced above does not directly JfeLscuss the Relationships between 
confidence rating and task performance, some indication of inaccuracies in 
sel^f*^appraidal^ qan be gleaned from it. For example, 75% of the confidence 
ratings collected indicated that a task cogld be performed fairly well or very 
well, bift only 37% of the tasks were correctly performed with no coaching dur- 
ing the tests.* Thi^ ^majr be an indication of leniency errors. Goldberg (Hote 
3) has provided further analyses of the results of ,this study, and the relation 
sbig betweezf^confidence,and perfdrmance was found to be consistently low. 
Correlations examined, fo^ aeveral tasks range4^from -.30 to .q6. Goldberg 
(Note has also reporteii that later studies of retention of 'artillery skills 
showed a aiiifiXar lack of correlation between confidence ^judgments and task 
performance. Correlations ,in the .40 to .50 range were- found 'between averaged 
confidence ra.tlngs and averaged performance scores, perhaps indicating some 
ability ^to accurat^y appraisa performance in general, but consistently low 
correlations wer^ found between confidence and performance ott specific tasks. 
It. is interestliig to note that the non*relationships described' above have not 



b€L&n discussed in published reports/ Other retention studies (e.g.» Rqse and 
Wheaton, 1978)' have beea found in which subjective appraisals of proficiency * 
were collected but their rfilatibnship to perf omsence was not reported. It is 
probably a saf^ conclusion that no significant'.r'elationshitts were found in such 
stvidies, and that retention research in gener.^i^haa not found subjective ap- \ 
,|)raisal3 of pioficiency to be accurate, ' ^ ^ " ' 

^ In summary, the data reviewed above vindicate that subjective appraisals of , 
proficiencies (largely in terms of self-appraisals) on specific ta^sl^ e ofj ^en do ^ 
not represent true abilities. This appears to be especially true when the 
subjective appraisals are ^compjared to objective well-specified perfonaance 
criteria. If 'subjective appraisals are influenced by leniency errors (the 
data above indicate that they are)^ and if the perfoi;mance criteria are also 
subjectivja ^nd len;ient,,*then a falsely high relationship carr be expected be- 
tween these two^m^sure^. . Before' subj ectlve appraisals are uSed as feedback 
from field units, to Centers/schools, the relationship between such appraisals 
and more objective measures of performance should be further examined. Such 
examination should use tast-specif ic performance tests with valid objective 
standards. Self-ratings of proficiency may only be accurate when addressing 
explicit tasks with whjch the ratees have extensive experience. This point 
will be further addressed in a later discussion of ways to improve the utility 
of subjective appraisals. 

Task Criticality Appraisals 

Another type of subjective appraisal of concern to TRADOC Centers/Schools 
is estimation of task criticality. Limited resources and time do not ajlj^^ 
training of all tasks in a given MOS in the training institution. Tr£iiiihg^ 
developers must thus somehow decide which tasks are most critical for combat 
.perforinance ^d therefore most important to train.* This is typically acc^- , 
plished by preparing an extensive list of tasks and asking subject matter ex- 
perts to subjectively ra€e their criticality, usually by employing some sort of 
rating scale. These experts may be drawn from personnel available in the 
training institution^ or feedback may be solicited from pe<f^onnel in field 
units (often through CODAP surveys). Xn either &^e^ the judgments are based 
upon'field experience and thus represent subj ectivc -^feedback from the fiel<f to 
Centers/Schools. Just as *with estimates of proficiency^ ode can question how 
accurately subjective appraisals of criticality repre^^pt the "trife" relative 
Importance of tasks. 

' Data are relatively sparse in this area, but those available have been 
summarized by Harris, Osbom, and Boldovici (1978)^ .These authors conclude 
that'a key problem with criticality estimates is that rater agreement (inter- ^ 
rater reliability) has generally be^ ^found to be low. They also conclude that 
nothing is kOown about the predictive validity of criticality ratings, or the 
degree to which such ratings correlate with more objective measures of tjask 
criticality (of course, one of the problem^ here is .developing objectiye 
measures of criticality). Since such measures cannot be developed during 
actual combat, they must be deyeloped using simulations and war games, which 
can be costly and time-consuming. But as long as t^e reliability of criticality 
ratings is low, their ^accuracy or predictive validity also will be low. Harris, 



11 20 



Osborn, and Boldovici (1978) suggest several ways in which the reliability of 
criticality 'estimates can be increaseSPpuch as using paired-compa^^ison tech*- 
niques for determining the relative ^ther ^than the absolute criticallty of . 
tasks. These techniques will Ue addressed in a distcussion of ways to improve 
the accuracy of subjective-appraisals in a later saction of this pa{ier. The ^ 
important point for now is that the relevant data available do nOJt suggest 
that, subje^ctiv^ appraisals j^f task dtfiticality are reliable or accurate. If 
accurate measures of task criticality are desired, further work is needed tfo 
make criticality ratittgs more relialjle and objective'. ' - 

Task Difficulty Appraisals 

The next type of subjective appraisal t;6 be discussed here involves judg- 
mWts of the difficulties of tasks. Such appraisals are important to (Centers/ 
Schools since the relatjjve difficulty of tasks influences the distribution of 
training time and resources. If particular tasks are more difficult for spl^ 
dlers to perform an(Sretaln, they should be given increased emphasis in the^" 
training base or retraiited more often in units. Appraisals of task difficulty,, 
^re often made subjectively^ that i^, training developers decide, b4sed upon 
their experiences and the opinions of .subject matter experts, how training re- 
sources *should be distribi^ted across tasks. How accurate are expeijts^ ap- ' 
praisals of task difficulty? The two sets o^ relevant data sutamarized below 
indicate that the accuracy may be rather lowv 

Ryan-Johes ^979) Obtained squad leaders' and platoon leaders' ratings of 
difficulty for 18 basic infantry tasks an4 compared them witb^,the percentage of 
soldier? failing each task on the written component'of the SkSJl Qualification 
Test (SQT) . The correlatioa between these two sets of measures was low .(-.38y, 
indicating that expertis' ratings of difficulty may not be representative of 
actual task difficulty. This in-terpretation is based on the assumption that 
the written component of the SQT is -representative^ of actual £ask performance. 
If this assumption were not porrect, one could conclude that the experts were 
Xight but the SQT is wrongs What is needed is a comparison of experts ' ratings 
with actual hands-on peyformance results. Harris, Campbell, .and Osbom (1979) 
accomplished this by coib|)aring expert ratings obtained"^ from training developers 
and senlqr NCO',^ with performance results obtained during the TSmy Training 
Study (ARTS; 1$78).; The experts' difficulty ratings were found to be unreli- 
able and unrepresentative of performance. For e«ample,' iJheil experts wer^^sked 
to select the most difficult -element of a task, they selected trie element most ^ 
o^ten performed wrong only 16% of the time. Usiiig more lenifent criterion, 
they selected one of the three most commonly failed elem^ts of a task dniy - ^ 
452' qf the time. ^ Thus, indications are that subject matter experts are not 
accurate in appraising the difficulty of performing taalcs or elei^ents within 
(tasks* It may be that experts' conceptions of tasks differ from those of 

rtft/ices, leading expe r t s tu be mnr faie^ l:o^ predict ^j^tege^-ela tlv a n o v lce g 4^113 

^counter difficult^ies*^ In any case, experts' ratings of task difficurty' 
should not be j^ccepted'as accUirate Without further comparison with objective 
per&rmance data^ ^ \ ; ' \ 

One possible reason.for the lack of reliability ^d accuracy that has, been 
found in ratings of the difficulty of tasks may lie in the way that difficulty 



( 



has beer^^bjectlvely appt^ised (HilTer, Note 4). When one is asked to judge - ' 
the difficulty of a task, one can interpret and aoswer the question it! various 
wajTS.^ The task may be difficult to train or teach, difficult to learn/ or 
difficult 'to perform once l^rned. The^^^iffering interpretations of diffi- 
culty will not always lead to the same subjective Uiflgments. For example, 
learning to encode/decode and authenticate messagel^s fairlyjjdiff icul^, but * ' 
these tasl^s are easy to perform after they are learned. Conversely, learning 
how to locate an anti-personnel mine is 'easy, but performance of the task is 
painstaking, stressful, arid difficult. If , subject matter experts rating tasks 
such as these differ in tileir intejipretation erf whether they are judging learn- 
ing or performance difflciilty, thCir ratings win not agree and interrater re- 
liability will'suffer. Thus^ when appraisals of task difficulty are^obtalned, 
the difficulty dimension should be operationally defined in term^ of teaching, 
learning, or performing. In'tljis way the reliability and accuracy of these 
appraisals can perhaps be increased. This hypothesis is supporf^d^y Hiller 
^(1974), who found that students* ratings of jtext .readability (difficblty/ cor- 
responded to objectiye measures of comprehension on both immediate and delayed 
retention tests. The accuracy of these appraisals may have been due to the 
definition of difficulty In^^erms of a dimension* (readability) for which the 
raters shaded, a, comiaon understanding. ' ^ ' ^ 

Task Frequency Estimates ^ 

Developers of training pifograms may need to know how frequently specific 
tasks are performed in the field, in addition to how critical'and dlfficul^t^ 
they are. Tasks which are performed frequently generally require less sustain- 
ment training. Tasks which are not performed frequently may be important ones 
to include in unit training. If an infrequently performed task is also a 
critilcal^ one for combat performance, a unit training program should be devel** 
oped for it in order to lessen retention problems. So frequency considerations 
can interact .with those of criticality and difficulty. ' ^ ' * 

There are few data/available relating to the accuracy of subjectfye task 
frequency judgments. Various studies of skill retention (e.g.,^ARTS, 1978; 
Rose and Wheaton, 1978) have , obtained such judgments from soldiers in the field 
in order to examine the effects of practice upon retention. Little*relation- 
ship has typically been found between these two variables, which may indicate 
that no relationship exists, or that the f requenoy fc^'ft^nates obtained have not 
been accurate. Tumey and Colien (1978) also obtaWed data of relevance by 
comparing self~estimates of work effort and time with actual performance dura-^ 



tlon for three ta^ks in a coi^uter facility. The correlations of estimates 
and actudl effcnre were in the .30 to ^40 range, indicating only moderate ^ecu- 
racy in self-ap^^r^sals of tline and effort expended. ^ 

tt f 

"^^fr~ig vgry difficult, to obtai n ad j ective ^asores of task performance fre^ 
quency, since one would be required to observe the activities of individuals in 
a unit and count task performances , over a long period' of time.' Unit records ^ 
are generally not detailed enbugh to provide task performance frequency counts. 
Job books might^ \>e expected to provide such data/.but they are often incomplete 
and diffi.cult to consolidate <Burnside, 1981). 



: . ^1 ' 13 22 



Only one study has bean Identified which directly cora^red subjective esti- 
mates of task performance time and frequency with observed performance in a 
field setting. Johnson, Tokunaga, and Hiller (1980) reviewed the available 
llterature'arid Concluded that objective laethods wep€ needed to validate self- 
appraisals of tioie spent performing specific tasks, since previous studies in-^ 
dicated that such appraisals were not likely to be accurate* . They then aske<^ 
a sample of 9S officers and NCO's in. Infantry companies and Artillery batteries 
how oft&n they performed ekch of* a large set of tasks *in a typical month, and 
how long it took to perform each task once* These two estimates were combined 
by' the researchers to obtain absolute estimates of the time ^pent on each task 
in a\typical month. These estianates were corapare^ with dataj obtained Jby bb*- 
serving'tt^^activities of 56 personnel within their units* 1*ersomiel were ob^- 
served for an average of about four hours each, and the di^inant Behavior 
within eaSh ten'minute interval was recorded* ,The tasks add^Tessed in , the sub*- 
\jective estimates of frequency and time spent were categorized into broad con-^ ^ 
tenti areas for comparison with the observational data. The rank order'corre-' 
lations betweenSubjective estimates and observational data* were fbuftd to range 
from t65 to *90 for various levels of personnel,- indicating that tl^ estlinates 
were highly accurate. The estimates were fSund to inflate tjie absol^te amount 
of time spent at work, but they were reliably related to the qbservarion cri- 
terion. Convertiiig the pim^ estimates to proportions by dividing them by the 
total time estimates, yielded a truer pi>;ture of the distribution of time across 
tasks. ' ^ ' 

Why did^i^^^tuj^on, Tokunaga, and Hiller (1980) find that su}>^ective estimates 
of time spient performing tasks were accurate when this result has not been 
found elsewhere? Two possible reasons can be identified. First, the compari- 
son of time estimates and observational data was accomplished in terms of broad 
categories of 5asks, and 'not for specific tasks* It may be that time estimates 
are more accurate for general tasks than for specific tasks* further research 
with precise observational data would be necessary to determine if thi^ is the 
.caset Secondly, Johnson^ Tokimaga, and Hiller (1980) broke the time ^ent 
estimates down into two estimates, one for how often a task is performed an3 
one for how long a typical performance" takes. These two estimates may be rela- 
'tively simple to give and thus relatively accurate* If this is the case, we 
have evidence that frequency estimates can be relatively accurate and that sub" 
jective estimates in general can be made more accurate by asking more precise 
questions. More research using objective obsetvational criteria is'needed to 
further address these indications* * * ' 

Appraisal of Training Materials ^ ^ 

All the types of subjective appraisal discussed above are related to sbiiie 
aspect of performance on specific tasks* TRADOC Centers/Schools 'also have a 
missi9n to appraise the quality of individual and collective training materials 
they produce, such as Soldiers' Manuals, ARTEP's, coimnanjiers*^ guides, an4 cre^f 
drills. The appraisal of these materials is also accomplished largely through 
subjective approaches, such as the jreceipt .of^informal comments and the ad*- 
ministration of questionnaires (Burnside, 19S1V The issues addressed for 
materials are similar to those addressed for task performance, such as the 
criticality of the information in the documents, the frequency of documents' 



use, aM the degree to which chey enhance mission performance. On^ can address 
the accuracy of subjective appralsals^f these issues for associated training 
tmaterials as well ap for task performance, although little research has been* 
done^ln this £grea. . 

Cine study of relevance '(^hvem; 1979) examitied evaluations of a combat com- 
mah4er*^s, guide obtained via a questionnaire. There was an Indication that sec-' 
tlons of the guide jwere not evaluated Independently, since the^ tended to b^* 
rated the same. Thls^ Is evidence o£ a halo el^^t, similar to those described 
earlier. Another finding was that'each rating, d^^nded la'rgely upon the unique 
measure used and Its context, tnaklng gene^rallzatlon difficult. Some or^he 
problems encountered In, subj ectlve .appraisals of task performance may al^o \ 
occur In subjective appraisals of. materials. Conclusions and suggestions of-, 
^ered In this paper should thus be applied to both /breas of evaluation. 

Tentatj^ve Coacluslons ^ ^ ' 

~ / ' 

' ^hat can one conclude about the accuracy of various types of subjective 
appraisal? One appropriate conclusion is that directly relevant data are 
scarce* Few studr^ have .gathered comparative data using Sn objective criterion 
in order to directly analyze the accuracy or validity of subjective data. But 
studies which do allow sach comparisons, a^ well as studies of other aspects 
of subjective appraisals (e.g., rerfablllty and halo effects), indicate that 
subjective data are often Inaccurate, ^ere Is some Indication that subjective 
appraisals may be at least moderateljft- accurate when they address, explicit tasks 
with which the appraiser has extensive experience. But there Is also some In- 
dication that ^ubgectlye appraisals he^bme less accurate as they are compared 
to more objective criteria. And tb<fre'l9 evidence of the types of errors dis- 
cussed in the first section of this jxaper In subjective appraisals gathered in 
a military setting. Raters tend to disagree with each other (low Interrater 
reliability), tend to make general judgm^ts without distinguishing among the 
different aspects of a situation (halo e^Efect), and tend to provide positively 
biased ratings (leniency error*). Obviously, further research Is needed to 
Identify the*extent of. such problems in subjective appraisals, and to Identify 
ways of reducing or eliminating them. ^Initial steps In this direction are 
discussed In a later section of this paper. 

Cognitive Psychology . 

Subjective self-appraisal or the estimation of one's own abilities to per- 
fo3r^ specific^ tasks would likely be classified as Introspeptlon In the experl- 
melital psychology literature. Introspection involves the observation , by a 
person of his or her thoughts aifd feelings and verbal reports or behavior 
describing them. This ^technique w^s widely utilized during the early days of 
experimental psycholdgy, but was abandoned following behaviorism's emphasis on 
the analysis of^objective behavior. However, rebirth of Interest In the study 
of unobser\rable mental [Processes within cognitive psychology during the past 
twenty yeats has led to a reemergertce,of research on the accuracy of intro- 
spective reports. Host of this research has been directed^ toward Introspec- 
tions of higher cognitive processes such as problem-solving, but it may have 
some relevance to introspections of task-specific abilities. ' 



15 ■ .24 



Lieberman (1979) has issuecj a call for a limicedr return co incrospeccion 
as'an experimental technique, since Ic may be accurate in some ^instances. For 
example, ptople are able co accurately appraise and scace how t*hey will voC€f^ 
as shown by the accuracy of polls. There are several examples of accurate 
subjective appraisals in che^ cognitive psychology literature. .Carver (1972) 
^reported that subjective estimates of the percent of thoughts under sto6d during 
Threading correlated /98 with a test measuring tfie amount of information stored.^ 
This finding demonstrates an afillity to subjectively appraise the difficAilty. 
of a highly familiar ta^ such as reading. Kroll and.Kellicutt (1972) showed 
that peQple were able to accurately predict how well they could recall verbal 
material by reporting how many times they had implicitly 'rehearsed it. 
Lachman, Lachman,' and Thronesbery (1979^ found that j>eople who couldn^t recall 
the answers to general knowledge questions were able to accurately pf^dict 
whether they would recognise .the correct answjers* They also were found to 
spend mqjre time searching memory for answers they '"thought they knew than for 
answers they thought they did not know, which perhaps led to a self-fulfilling 
prophecy. Both Robinson and Kulp (1970) and Gardiner and Rlee (1976) found 
thax people are able to accurately recognize most of the items from a v^bal 
list that they previously replied on a free recall test. 

The evidence summarized above indicates that people can accurately appraise 
their pa^t and future memory abilities, at lead^ when familiar verbal material 
ifif involved. This higher-level knowledge qi memory abilities has been chris- 
tened metamemory (Flavell, 1970). Metaioemory has %een shown tQ be accurate 
for general knowledge and frequently used memory^ abilities, and for episodic 
(Tulving, 1972) tasks sdch as recall or r'ecj^gnition of verbal items presented 
in lists. Is lAjBtamemory available and acetirate for complex'motor skills whicb 
may not have been practiced extensively? Metamemory for specific motor abili*-V, 
ties may be available only in a general sense. That is, soldiers might ^ow 
that they had performed a task before and be able to verbally describe its 
general characteristics, but still be unable to accurately appraise whether 
they can per^Vm'the task, due to forgotten details or misunderstood standards. 
The characteristics of metamemory for complex skills and the extent to which 
ac^rate introspections can be derived from it are important topics for future 
research. As pointed out by Liebermah (1979), introspection should not be\^ 
totally rejected as an inaccurate technique, but rather the conditions undel^. 
which it is likely to be accurate and useful shou^be identified. In orderX 
to do this, introspective reports should be supplaSfented a(id .verified b^ DtherX^ 
behavioral or circumstantial evidence^ whenever possible. 

While arguments for thS use of introspection in some inlstances certainly ' J 
have merit, the accuracy of this technique is still a subject of debate in 'the . 
cognitive psychology literatur^ Kahneman and Tversky (1973) have argued that 
subjective judgments and predictions are based upon general heuristics rather/ 
than upon specific evidence available. ^ Their research shows that'one predicys 
by ^ele'^ting the outcome that is^most representative^of the input^ even when^ 
this outcome is statistically unlikely. For example^ subjects were asl^d to 
predict the major area of stv^^ for a particular student, based upon a written 
personality description. When the personality description was stereotypical ' 
of that for an engineer, subjects pf-edfd^ted that the student was, an engineering 
major. They persisted in this prediction,^ even when told ^hat ^e frequency 



bt engineering students was very low and that the personality desctiption might 
not be accurate. Kahneman and Tversky fl973) ^concluded that prior prgbabili- 
'ties are ignored when stereotypical evidence' is available, even if that evi- 
dence is worthless* ^ " ' , ^ 

Extrapolating from the findings described abov*^ to the iorts of task- * 
specific, self *^apprai sals of interest in the present paper, At may be that sol- " 
diers estimate their prof leniencies in tersas of what they should b% able .to do 
rather than in terms of what they can actually do,^ That ia^ i^fta sqldier is 
aske'd whether he can properly p^form a particular' tsTsk, he may'tespond posl- \ 
tively because he feels that a soldier Vi^th hiS^evel of experience shoj^d^be 
able to perform it* He may-not have actuall;^ thought out whether or how'^he ' . 
could perform the task. The soi<4ier may respond on the basis of a stereotype^ 
or implicit: theory about the abilities of soldiers at-his le^dl. Nisbett and 
Wilson (1977) have supported such a contention with .research showing that people 
do' not base reports, of their cognitive processes on true introspections.' 
Rather,' their repor,ts are based on tstplicit causal theories about the extent 
to which particular stimuli are plausible causes of ^ecif ic responses* They 
describe introspection as nothing more than judgments jof plau^ibilj^y and con-^ 
elude that "^he accuracy of subjective reports is so poor as to suggest that 
ajuT introspective access that may exist is' not sqfficient to produce. generally 
>dHrrect or reliable reports" (p. 233). Accurate subjective reports; would thkn 
only occur incidentally as the Result of use'of a correct implicit theory about 
behavior. Such reports could EOt be expeTcted to be generally accurate if 
people panno t^ in t Aspect ab<$ut th'eir me/itaL processes. But this is not the 
end of the matter/ Smith and Miller (1978) J\ave challenged HisUett^nd Wilson^s 
(1977) cpnclusions on theoretical an<l methodological grounds, ^nd they have 
argued that people 'can accurately introspect about their mental processes in 
some instances. These instances include tasks which arajioyel, engaging, and 
not overleamed, so that the mental processes involved arehot automatic and 
unconscious. These authors suggest that research be oriented not on the ques-* . 
tiotr of whether people can introspect about mental processes, but -rather on the 
question of the conditions under which such introspection is accurate. ^ < 

.In summary, what does the cognitive psychology literature offer that has 
r6le1^ance to the sorts of subjective appraisal of interest here? First, a 
caveat mentioned above should -be tepeated. Research on subjective .judgment 
within cognitirve psychology h^ primarily addreisedyhlgher mental processes. 
Findings in this context may. or may not direct ly^^&^ate to judgments about 
abilities which are more motor or "hands-on" in nature. Howev^, many of to- 
j day^s military ^asks are cognitlvely oriented', so findings from' the cognitive ^ 
research literature should haVe some application in sr military setting. . ' 
Analyses of the accuracy of subjective judgments in cognitive settings have > 
produced mixed results and have not yet provided convincing evidence . that such 
judgments are' accurate. Lieberman (1979) and Smith an4 Miller (1978) have^sug- 
gested^that debates Sbout'the general accuracy ft:' subjective judgments should 
be replaced by research addressing the conditions upder which such judgments 
can be accurate. ^The present paper'will^attempl^ to encourage movement in this 
direction by describing ways^ in whic^ subjective appraisals may be made more 
afccurate^ The military and cognitive research literature will be integrated 
in the development of these suggestions after review of findings concerning 
types of appraisers and appraisal methods. 

2S 

,17 



^ . ^YPES^ OF APPRAISERS ' - , \^ 

A priiEary consideratiod ^ in the uset of subjective appraisals is the sources 
from ijhic^they ar^^oltected. In situations such as the gathering o.f^M^jec** 
tive appr4isals as.:^^'ac1c from, military unites in the field, thr^^;*g^ner^l 
alternative sources a-i^^ Available! soidxei^ eyaluating themselves'^'Cself*' 
appr^±3al), supervisot^, and peer gro^up memBersr, For examffle, suppose th^t^ 
Center/Schpol personnel wish to economically appraise soldiers* prof icieiiicies 
oil specific taskst .^Sol4iers could J>e asked- to subjectively appraise their own 
perfprmance on ^he tasks, supervisor^ cOi^S'te asked to appraise "thef^JyerfortD^ 
''aace of soldiers working under them, or s'oldiers could be ask^d to appr^fise 
^he performance of their co-wprker^.or peerst A previous Review indicates 
^hat' the first two of these alternatives arc fhe ones most <:pmiiK>nly utilised 
by TRADOC -Centers/Schools (Burnside, 1981); The previ^ous section or the pres** 
ent paper summarised data relevant to the absolute ^accpracy of subjective 
appraisals* This "Option ^suininarizes data relating to the relative accuracy of- 
appraisals obtained from alternative sources, partlcu^jarly supervisor y^rsUs 

s^lf-^appraisals, „ ^ 

* - * • 

What are the relativer plusses and minu^s in utilising sel£**apprai3als 
versus subjective-appraisals gathered from'^ther sources? A primary benefit 
of self -appraisals pointefd out by numerous authors {e*gt, Levlne, 1980; ^imoff 
1980^ Shrauge? and Osberg, 1981) is that individuals have extensive data avail- 
able about thanselves and can provide information that is unavailable from 
other sourcest We observe. outselves continuously in oift daily work settings, 
while supetvisors ,and peers may have limited opportunities to observe our per** 
formancet Gi^en basic self-observatibu and memory capabilities, we should 
then have ^ore information available relating to our abilities than any other ' 
source, Ifowever, a note of caution is appropriate here. Recall that some of 
■the cognitive psychology literature summarized earlier (e«g«, Nisbett and 
Wilson, 1977) -.calls .into question our ability to introspect about our own capa- 
bilities, at least those that are cognitive in nature* But until this issue 
is resolved, we can at lea^st theoretically expect seliC'-appraisals to benefit 
^rom the relatively large amount of information available* A related potential 
advantage of self-appraisals is' that individuals geneVally attend to situa- 
tional factors in their owa behavior, whereas butsid^' observers m^ not be 
aware of such factors (Wills, 1978| ^ShraUger and Osberg, 1981),t Individuals 
might thus be exp^ted to be more accurate in appraisals ot their ^wn abili* 
ties, ^Ince outside observers might tend to over-generalize across ^tuationst 
In fadt. Wills (197?) Has shown th^tt ohaerv&s tend to regard small samples Of 
others* behavior as sufficient' evidence f^ generalised pei:soi>dEiity . d^spbsi- 
tionst Supervisors and peers iri^ similarly tend to oveSr-gen^ral^ze "^iJmt 
abilit^^es based upon a small sample of 4^ta« A final more pract^c^l a^v^antag^ 
of the self-appraisal approach is that if^ is. likely to be mcnre^e^omicalL, in * 
terms of time and resources, than ape-yafthej: approaches^ . . ^-^ ^ ^ J 

*Onrfinajor disadvantage of *^elf^apprai^als is that all^deB^ to atbSre; fte.,^ 
people may not be, capalilfe of appraising themselves co^pe^ntlji^^ J^eJ^y ndt be 
aware of many of our -cognitive and^motor. abilities, sinc"^.3ome op tjhem may be 
Automatic and unconscious* Further basic .research wi^ iilices^'|fi?y tp resolve 
this concern; thus^far, research ^nd 



theories^ relating "fio * out*^absLllty 



■ - ■ . " 

evaluate task-specific prqf Iclencle^lta^S't^a";^^ nonexistent* The 

secPnd major concern with the use c^^^^^r^pj^^als Is the possibility of re- 
sponse biases* We may have tnc^re l^i|^i^^ti^.i(j^ ourselves^than 
anyone else^as^ but w^ also hiav'e ^^^^^^4^?^^^^-^^^^^^ appraisals In a 
positive direction* This would resulS^ error of the sort de- ^ 

scribed earlier* \^ ^x^'t ^ \ 

Relative Accuracy of S^lf-^Appralsals 

" * t 

Shrauger and Osberg (1981) have examined the utility of obtaining self- 
appralsals and appraisals from other sources In a variety of situations* ^ Since, 
some of these situations have at least Indirect relevance to the military, the 
revlew*s conclusions will be summarized hertf* In the area of academic achleve- 
-ment, self^^appralsals were found to predict academic jf^erformance at lekst as 
well as most prt>jectlve tests that Mve been utilized* Butr self-appralsal? 
dl4 not do ^s well when compared with previous performance In the same situa- 
tion* That Is, college grades were better predictors of future collfege grades 
than self-predlctlons were* Self-appralsals did ^how higher predictive accu- 
racy than performance In a preylou^ situation;. l*e*, sfelf-appralsals predicted 
college grades better than high school grades did* This l^ds to the con- 
clusion that self-appra'lsals may be useful when performance Indicators gathered 
In the situation of concerrtVaje not available* Sel^-appralsals of^t^k pro- 
ficiencies btay be accurate relative to resul&s of written knowledge tests, but 
not relative to results of actual "hands-on" performance* . ' 

With respect to the use « self-appraisals to predict actual job perform- ^ 
ance, Shraug^r and Osberg (1981) found few da^ta available In settings other ^ 
than the Feace Corps* ,And tl^results from this 'setting \^re not £ound to be 
particularly useful, since they were not consistent and involved comparlsoli of 
self-appralsals with appraisals by peers and supervisors, rather than with m6re 
objective measures of on-the-job performance* Conclusions reached in this area 
were that sufficient data are uojt available to determine how well*people cah « 
^appraise their performance relaclye to appraisals developed by evaluation 
Boards of supervisors and peers, and that surprisingly few data are available, 
in general, to address the usefulhesd of seXf-pr^dictiOns of job performance* 

After cqmparing self -appraisals with other methods of prediction in numer- 
ous areas, Shrauger atid OsKerg (1981) found that 29 studies showed self- 
appralsals to be more accurate, while 10 favored other appraisal methods* This 
result seems to support the use of self-appralsals, but two caveats are in* ' 
order*' Flrdt, the accuracy. 6f self -appraisal wajs found to vary with the type 
of behavior being predicted* Self -appraisals did well in^general areas such as 
jtocatlonal ^choice and judgment of personality traits, but were found to be 
consistent in more specific ^reas such as' job perfonnance in the Peace^ .torps* 
Second, no adequate comparisons of s^lf-appralsals with objective measures of 
job performance were found* The predlctlve_accuracy of self-^appraisals has 
been compared with predictions /derived from projective tests, evaluatloi^ 
boards, and other general assessment 'techniques , but i^t has seldom been com- 
.p&red with objective measures of actual job performance* The conclusion that 
self|-appralsals are as good as other appraisal methods may indicate that al}. 
methods are equally pooi^, ^nd not that'self-appralsals'are accurate* 



/ 

s^Qne Study conducted, in a military setting has supported the relative accu- 
racy of self-am)raisal techniques, but it also suffers from ^ weakness dis- 
cussed above. Dyer and Hilligoss (1979) obtained self-appraisals jand other 
pr^dictox^ of job performance for Over 400 officers and NCO*s iji an assessm^t 
center., .The^ criterion wii:h which these predictions vas compared Wks field 
leadership ^performance ratings obtained frq^n superiors, peers, odd subordinates 
o& these personncrl si^c to 18 months, aft«^ ^assignment to a unit. ^Again, th« , 

Criterion- is not really objective and what W£ have is essen^ally a comparison 
9i tffo sets of subjective ratings* Results showed that 11 to 14 percent of 
*t5^ types of self -appraisal measures correlated significantly with the^cri- 
terion, t^ile only nine percent ^^f assessment exercises and seveir'perQenr of 
peer ratings 'proyided successful predictors, . This "^result might be used to 
argue ^or the relative accuracy of self^ppraisals, but nA»re^ interesting is 
the Ipw predictive accuracy of any method^ Even when uaing another subjective 
measure a^ the criterion, only a small percenpaga of self-^appraisal measures 
were found^ a<?curately predict future performance* ^ ' 

^ . ' y . . - ^ ^ ' 

Thornton (1980) .has provided a I^horough r^iev of the accuracy of self- 
ap^^isaXs of job perfpnziance using tl^e framework pf types of errors or prob- ^ 
lem^ discussed earlier in tti^^resant ^aper ""Xi-e. , leniency ^errors< and b^lo 
effects). This fr^m^work will be.u^ed'^ere to suianarige his conclusions and 
those* of oth^ authors, where appropriata., V^ith regard to leniency errors, ^ 
many stndies' have shown that in^J^vlduals rate themselves ^higher th^ thay are^ 
^ated by pthers. Self^ra^ings havS bea^i shown*to b'e higher tharv^^ting^ by 
su^^pfisors, peers, and^assessment center raters. Holzbach (1978) also con-^^ 
eluded that self-ratings are more .lenient than ratings by supervisors or j^ers,^ 
dnd that supervisor and peer r^tings^do nc5% differ significantly. Meyer (1980) 

*has suipariged years of research vhich led to the conclusion that most people 
have an unrealistically positive perception- pf their cm job performance^. ■ He * 
found that typically at least 40 percent of .employe^s^ate themselves as 'being 

, in the top teiv percent of* performers, and that aJ^^tr no dne rates'themselves 
as being beloU average. He also fcfund that ^ublicly^announced self -appraisals 
tend nqt to be as positively* biased as those given in confidenc'e* ^ \ 

'This last finding reported by Meyer ^1980) brings up an important .point > 
abot^t the accuracy of self ^appraisal's. Although self-appraisals have generally' 
been found to exhibit leniency errors, this not always the case CVan' Rljn, 
.1980). ' Special measures can be taken to reduce the occurrence oi^ such errors. 
For example^ self -^appraisals m&y be lea3 lenienr if the ^ater lotows that his^ 
Of her. supervisor may see the ratings* leniency ^rrors can also be i:educed'££ 
the rating, scale does not require the rater to coiopare himself or herself. to 
an parage task performer* People are heBitant to rate themselves as being 
"below average," t>ut may be willing to rate themselves as "better than 25 per* 
cent of task performers." Stings may also^ be less'inflated^or lenient if 
they are vdrlf iable>(Van Rljn, 1981). The accuracy of at least a sample of 
any set of obtaindj^ self*^appraisals should be compared vith objective Measures 
of performance^ such a^ a *'hands-on" test* If raters are avare that Jtfhey will 
be tested on task performance after giving their self -appraisals, they may ten^ 
to be. more accurate.. Both Mitchell (1979) and ^rick and Semmel (1978) reported 
a related finding that observers report the behavior of others more accurately 
when they know that ^the accuracy of their .observations is being checked* The 
Accuracy of seU*^appraisals should be checked/ if at all possible, in order to 
reduce leniency errors. ^ \ - ^ 



Un^ss sp^ial measures ^re taken to eliminate them, leniency errors are 
likely to be 'a ^erlous^ probl^ .when using self^appralealSi .In fact, the prob- 
lem may b^.even more severe than is indicated by the litersture (Van Rijn, 
19ffl)*, *ln most of the relevant research, self-appraisals have been gathered 
in eaq^eifimeij^al settings in which^raters know that their self-ratings will have 
t£o real effect on aspects of their future job environment, such as promotional 
opportunity. If self- appraisals were to have a real impact on the job, the 

^tendency for^inf latioa* of ratings might become even more evident.* In a mill- , 
tary setting, soldiers might inflate their self-ratings of ta^ proficiency if 
they felt that this would in any way increase their opportunity for proinotion. 
They mig^it^also inflate sejlf-ratings in-order to avoid participating in re- 
ti»aining for tasks they feel they cannot do. The problem of leniency in 

' ratings means ^hat great care should -be taken in utilizing self-appraisals in 
the real world. Meastires suth as those sugges ted v above should be' spplied 4o 
Veduce- leniency, but further research is. i^eeded to determine the effectiveness 
of such^jmeasures, in real*world ^settings. 

summarized ^veral sets of data allowing comparison of 
sup^rVjl^oi;/ and self -^appraisals, and he concluded that bhere may be more random' 
error' 4a supervisory ratings. This appears to be due to supervisors having 
Inadequa^te^opportunity tfo observe the behavior being appraised, f MacLane (1977) 
operationally d^ineji-tniteliability of appraisal as an error in which raters 
g^ave, different ratings to the same ratee for different statements concerning ' 
'the «4uae 'dimension. Supervisors demonstrated errqrs or rating inconsistencies 
^iti 27^pei?ceht of their appraisals, while the self-apprai%al error rate was only 
nine percent. Supervisors seemed to lack information about the people they 
were rating, and thjey were frequently unable to sup|»ort theitf appraisals with 
jfLxample$^ of Ji>ehavior on the job. Self-rat^s were able to provide, such sup- 
port; as staj^ed earlier, one advantage of self-ratings is that people have ex- • 
tensive information available about themselves. Self-appraisals may be more 
accurate/t^h^ supervisory appraisals in situations where indiLviduals have ex- 
tens,ive experience performing the tasks being appraised and jsupervisors have 
not had extensive opportunities to observe task performanc^ 



Thornton (1980) found that in the few studies which hahi reported variance 
in ratings, most found less y:ariation in self -appraisals ^pf^ i^ appraisals 
from other sources. ^Hgwever, the halo effect has generally 
Iqw^r ffr self -ratings fe>Ubach (1978) and Van Rijn (19£0^ 
that; appraisals by supervisors tend to shd^ a" greater '1^^ 
appraisal^ do. This result is probably related to the eatJ^^^er 
ing that« people tend to be aware of specific situational'd 
own perforipance and ^r^ thU5 less willing to over-generali 
servers are. Halo effects kre thus not as lalrge an area o 
appraisals as for subjective appraisals from other sources 
reduced ha^b effect occurring in conjunction with reduced 
are vntlear^and need further examination. 




Its of th^ir 
an external ob- 
:em for self- 
le reasons for a 
in ratings 



In reviewing studies which directly addressed the relative accuracy pf 
various appraisal sour<^s, Thornton (1980)' reported finding inconsistent re-. 
\ ' ' stilt?. Elev^ studie* .^hoUed a lack of agreement between self-appraisals and 
^ ^ appraises from supervisors or peers, while seven studies found at least partial 



•3 



21 



30 



agreement between rating ^ources^ OtHer studies hav^shoun that self-ratings 
are often not reliable or stable, and thus could not expected to demonstrate 
validitV- nXhese findings suggest that job holders have a different view of 
their job performance than other people do, and that self-appraisals Should be 
used very carefully. Evidence for the accuracy of self -appraisals is at this 
pqint^^ieager {^sn Rljn, 1981), Further Work is needed to identify those situ* 
ationa in whic|L self^ap^ralsals may ISe accurate. 

Pe^er Appraisals 

The Sfiscussion above has centered around t^e self -appraisal approach, .since 
this is the method most commonly used for gathering .subjective appraisals. 
Another method which has not been frequently used In gathering feedback by 
Centers/Schools but which .deserves further consideration is peer appraisal. 
The res^i;ch summatized above indicates that peer appraisals are inore Similar 
to supervisor app^i'sals than they are to self -appraisals, and toat the rela- 
tive accuracy of these differ^ent approaches has not adequately been addressed. 
Review^ of .the peer evaluation literature have provided mixed conclusions about 
the characteristics! of this approach. Downey and Duffy (1978) concluded that 
peer appraisal jnethods have demonstrated substantial validit:y and thus provide 
a useful tool for predicting performance. Lammlein and Borman /(1979) found 
that peer ratings show high interrater agreement and prc^ride good pi^edictions 
of future performance. They did not provide enough detail on the studies re- 
viewe5l to indicate how they reached this latter Conclusion. Kane arid Lawler 
(1978) reviewed some of the same literature and reported that, no'' studies included 
an adequately objective measure of performance. The research/ on accuracy of 
peer appraisalsu;ompared to objective criteria thus^^pears to be open to dif-^ 
fering interpretations.^ K^ne and Lawler (1978) also reported that no sttilies 
have allowed a direct!> comparison of the accuracy of supervisory atjd peer 
ratings, while Laimnlein and Borman (1979) concluded that ratings from these two 
sources correlate moderately mil. The relative accuracy of peer appraisals is 
still a subject of debate*/ reviewers looking for objective/criteria have found 
no reason t;o conclude that such appraisals kre accurate, ^eer Ratings may have 
some characteristics (^.g., high Interrater agreement) which make the^rr use 
desirable in feedback systems. However^ as with self-appraisals,, peer ap- 
praisals should be used carefully in conjunction with a dheck on their accuracy, 
since their general accuracy has n/^t been consistently demonstrated in the re- 
search literature thus "far. / ^ . j 

i r 

Tentative Conclusions , * 

Research on the relative accuracy of subjective appraisals gathered from 
various types^of sources has left many questions unanswered. It is difficult 
to address the relative accura(^ of appraisal 8ource$^ when the absolute accu- 
racy of each of them is undete^ined. What is needed is a 'study which includes 
the collection of supervisory ,j peer, and self^predictiona of proficiencies on 
specif tc^ taaks, followed by objective measures of t^sk performance. The l^t*- 
erature thus' fai^ has generally failed to include objective criteria for com-^ 
parison purposes, ^ and until jLt does the accuracy ^ssue will be unresolved. 
Self-^appraisals Usually suffer from. leniency biases, andjiwi^ and supervisory 
appraisals may suffer ^rom tendencies to over-generalia^e^rom small samples of 
data. *^ Accuracy of^the^e approaches should thus not be assumed, but, should be 
ehecked ageing relatively objective criteria. / 



22 / 31 



t 
> 

TYPES OF APPRAISAL METHODS 

The final issue to.be addressed relates to methods which can be used in 
collecting subjective appraisals. The data reviewed thus fai^ suggest" thsit syb* 
jective appraisals should not be indiscriminately used as feedback to Centers/ 
Schools, since the accuracy o'f such appraisals is yet to be fully determined. 
But subjective appraisals are going to be^^used in the real world,, due to the 
relative ease and economy with whiph they can be collected. Thus, authors such 
as Lieberman (1979) ^nd Smith ^d Miller (1978) are correct in the asj&ertion 
that it is moie fruitful to identify methods and situations whi^h allow one to 
maximize the reliability and accutacy of subjective judgments, rather than ,to 
debate at length the ^neral accuracy of such :}udgments. In keeping with this 
suggestion, the remainder of this paper will concentrate upon methods for 
creasing the*accuracy of subjective^ "appraisals. Methods discussed in this 
section iTill lead to recommendations and suggestions summarized in. th^ next 
section^ 

Surveys and Interviews , , 

Since' surveys and interviews are the most comi^^jily used approaches for . 
gathering subjective feedback data, the first issue^^o be addressed here is 
which of these methods should be used in specli^c situations. Survey data have 
the advantajge of being easy and economical to coll^^ct, particularly if they 
are gathered through the mail.* However, data sumniarized by Burnside (1981) 
indicate that response rates to mailed surveys are often so low as to make 
this approach to gathering feedback inadequate. In order to gather survey 
data from a representative sample, it is generally necessary for a data tol^ 
lector to be on-site in the field. The interview approach has the advantage 
of allowing pollection, of more in-*depth responses, but it is considerably more 
resource^^intensive. Interviews- are usually conducted in a 9ne-on-one setting, 
and this leads to extensive time commitments on the part of data collectors. 
By^t this may be tiioe well spent. Burnside (1981) ^found that battalion staff 
personnel feel that the^ give more thoughtful and in-d&pth answers to interview 
questions than to survey questions. These personnel are somet;imes so inundated 
with surveys tWt, they do not take time to^ respond to them' carefully, if at 
all. The use of Interviews may thu^ in some cases result in collection of 
more valid data. 

Hall, Benton, and Zaj^owski (1^78) conducted a direct comparison of feed- 
back data gathered by mailed questionnaire and structured Interview techniques 
for several tasks In the'Havy. Results indicated t{iat these approaches pro*, 
duced equivalent dat^ pertaining to the adequacy of Initial training, the fre- 
quency of task performancfSy and supervisors* a|}praisals of on-*the-Job profi- 
ciency. HoHBver, the inte^iew used hjere was essentially an orally administered 
survey, so equivalence of results is not surprising. Problem^ were encountered 
in obtaii^g a satisfactory return rate for survey^^demonstrrating a common 
' problem with this technique. This study shows ith^^Equivalent subjective ap- 
praisals can be obt&ined ii^ response to written or oral questions,^ if one can 
get aroiind^ the problem of low return. rate of surveys. But a more interesting 
issue than how survey and Interview responsj&s can be made eqtiivalent is how 
they can j>e designed to supplement each other. Survejis can be used to obtain 



■• : ■ . ' r ■ 23 32 



• ■ ■ ' 

a ^enera^ overview of where problem aiieas lie. Interviews can then be used to 
obtain. more. in*depth data on specific problems and the reasons for thent. « 
InctdentkUy, Hall, ^H&Mton, and Zajfcowskl (1978) not only found that survey 
and , interview response^ were equivalent, but they alsd^found, that proficiency 
ratings obtained in interviews did not correlate significantly with results of 
, written knowledge tests. When surveys and interviews are used to gather sub- 
ject;ive feedback data, a'check on the accuracy of such data shtiu^d be* included. 
A total feedback system should thus use surveys, interviews, and objective 
t^ts in conjunction. * 

Phrasing of^Wpuestions ' ' 

Another important methodological issue in the collection of subjective ap'- 
praisals is the nature of the questions asked. Meyer (1980) has provided an 
example of how^ this variable can influence the value of the Information 
gathered. Self-appraisals which involve the comparison of one's abilities with 
those of others on ^specific tasks often lead to leniency errors. But compari*^ 
son of one's own relati^ve 'strengths on different tasks may lead to reasonably 
accurate and useful ratings. Questions should perhaps be phra^d to ask self-^ 
appraisers to compare their own relative strengths in abilities, rather than 
to compare their abilities to tirose of others. When a rat'ing scale requires a 
respondent Jto compare his or her performance with the performance of others, 
the respondent must have knowledge, nbt only of his or her own abilities, but 
also of others' abilities. Since such scales require an assumption of addi- 
tional knowledge, they should be avoided where possible. 

» Bemardln, Beatty, and Jensen (1980) suggested that subjective rating in- 
struments should be based upon a thorough job analysis, and Fr^off (1980) 
provided some further recommendations in this direction. Designers of subjec- 
tive appraisal questions should be certain that they have an understanding of 
job elements In common with that of raters. A question designer who is an ex- 
pert on the tasks addressed may have a different concept of adequate task per- 
formance thatt a rater who is a^relative novice. If possible, rating scales 
should be phrased in terms of explicit behavioral measures of performance 
rather 'than ilTgeneral terms such as *'can do the task with no problems.*' Qr, 
raters could be asked to provide specific experiential evidence supporting 
their claims that they can perform particular tasks. Appraisals fbased on ob*^ 
servable behaviors are more closely related to task performance ^han are ap- 
praisals based on general factors, such as^ inferred personality traits (Van 
Rijn, 1980). A common base or standard for ratings should be ensured between 
question, developers and raters.- If raters are asked whethVf ^^^y can perform 
a task to^standard, carj& should 1>e exercised to ensure that they have the cor^- 
rect. standard in mlid^-. Care should also be exercised to ensure that all racers 
Interpret the rating dimension similarly. As described earlier, a general/ 
dimension such as task difficulty can be interpreted in various ways, so it 
should be operationally defined to raters. * . ^ 

Shrauger and Osberg (1981) have.^econmtended vay^s in which questions can be 
phrased to maximize accuracy, in addition to the general suggestion' that the 
situation and behavior to be predicted should be specified exactly. There is 



24. 33 



some evidence chat racings of maximal behavior result in more accurace predic- 
tions of future actions than do ratings of typical behavior. Developers of 
appral^l questions should b^ aware of whether the criterion they are Interested 
in involves maximal, or typical functioning. Questions designed to obtain pre- 
dictions of performance in stressful combat situations may not lead to responses 
which correlate with day-to-day peacetime performance. Question developers and 
respondents should have a common understanding of the situations for w)iich be- 
havior is being predicted, and criterion measures should be obtained in the 
same situation. Questions shpuld also be specific as to the action being pre- 
dicted and .the target of that action. Research has shown that attitudes corr , 
respond more closely to behavior. as actions and targets are specified in ^ 
grea-ter detail (Ajzen and Flshbein, 1977). The implication of this finding 
for subjective appraisals of proficiency is that the action or behavior to be 
predicted should be specified in detail, along with a clear definition of when 
the action is completed and .what the result is.- ^ , 

^elevajft to this discussion of how to design questions to maximize the a^ 
curacy of subjective appraisals is a technique appll^ed by Harris, Osbom, and 
Boldovici (1978}. described earlier, these authors fouad that rater agree- 
ment was typically low in studies of subjective critic^lity estljaations. To 
get around this problem, they used a paired-^comparison technique ^in which 
raters compared tasks to one another rather than rating each task on .a numeri- 
cal scale. That is, two tasks were described in a well-defined situation and 
subjects were asked to identify the more critical one* In this ray, relative 
rather than absolute criticality ratings were obtained, the. judgment process 
was simplified, and an operat;ional definition of criticality was provided. 
Results showed that yse of this method increased interrater reliability con- 
siderably, to higher than the .90 level in some cases. The effects of using 
thi% technique on the accuracy or predictive validity of criticality estima- 
tions was not directly addressed, but an approach which increases the relia- 
bility of subjective appraisals would, be expected t<J .also have a positive im- ' 
pact upon validity. One operational problem with this approach is the extent 
to which complete pairings qt^tasks can be presented for comparison. With more 
than a £ew (six or «ight) tasks being evaluated, the number of pairs becotbes 
so largeyas to preclude presentation of them to all raters. In this case, 
some metznod of partial pairing must be used, and the best way to do this is 
not ^l^ys clear. Sq^this technique would best be utilized when a small number 
of tasks »are being compared. It could easily be adapted" to situations where ' 
the pefformance proficiency, Jlreguency, or difficulty of specific tasks is be- 
ing appraised, as well as the c\iticality. 

Raters' Experiences 

. i 

Another major variable iJiifactlng upon the accuracy of subjective appraisals* 
is the extent to which raters share common experiences. This variable has most 
commonly been addressed in^ terms of training provided to taters before they 
provide subjective appraisals. Cascio (1978) reviewed the effects of such 
trainii^g programs and concluded that training for raters is most beneficial, 
when it includes ptactice with the specific rating scales to be used, dis- 
cussions of errors commonly made by raters, and emphasis upoil distinguishing ' ^ 
among the different aspects or dimensions of^ a situationt Research results 



Indicate that training programs designed in accord with these recjpipendations 
reduce the aoyilint of h^lo effect and other errors in subjective ra^in^s. 
Bergman and Siegel (1972) corTClu4ed that training programs are effecti^ to 
the extent that they eliminate idiosyncrasies in tiie w^iy raters observ^ their 
own or others* behavior. There are also indications that the .degree or type 
of training impacts upon its effect. For example, Bernardln and Walter (1977) 
found that ofte hour of training on the nature of psychometric 'errors resulted 
itt significantly less halo error in subsequently obtained ratings. But ex- 
posure to the scale to be utilised in addition to one hour of training resulted, 
in less leniency error and higher interrater reliability, in addition to re- 
duced halo error. So 'training in making subjective appraisals tan be expected 
to have a positive impact upon their accuracy. This training should include a 
general discussion of the types of errors commonly made and experience^with the 
specific rating scale to be used. If a large number of subjective appraisals 
are being collected over a long period of time, training should be provided 
during the rating period as well as before it. Research summarized by Prick 
and Semmel (1978) has shown that reliability of ratings may decrease as a func^ 
tion of time since training. ' > ' 

Other Characteristics of Raters 

Shrauger and Osberg (1981) have summarized several other characteristics 
of raters which may influence the accuracy of subjective appraisals. One Im- 
portant cpnsideration is whether raters have the intellectual or cognitive 
capacity to effectively appraise their own and others* performance. Mc^^t 
studies of the accuracy of subjective appraisals have used subjects of above 
average educational and intellectual levels. These studies have ge&erally 
found low accuracy, and the accuracy migb^be even less for samples of soXSiers, 
many of whom have not coi^leted a high school education. This hypothesis is 
support^ by Gorsuch, Henighan, and Barnard (1972), who found that the jelia- 

ability of a scale depended upon the reading ability of the raters. Errors of 
measuremant were found to be small for good readers, but were large for poor 

^readers. Further research is needed to address the relationship betweep level 
of education and ability to make accurate subjective appraisals. 

Another individual characteristic whleh has been found to influence the 
accuracy of self-appraisals is the degree of raters' self-consciou$nes8,or 
self-awareness. , While this variable may be difficult to operationalize, it 
could perhaps be delineated in terms of experiences on specific tasks. Zndi-^ 
viduals would be expected to provide more accurate subjecx^ve appraisals for 
tasks with yrhld), they have extensive experience, and tluiysTiould never be asked 
to appraise tasks witb which they have little pr no experience. Data support- 
ing this point have been reported by Primoff (1979). /He found that job appli- 
cants were moderately accurate in self-app raising th4ir abilities on familiar 
tasj»» such as spelling, but were not accurate on less familiar tasks, such as 
comparing names and numbers. Ash (1980) reported sii^lar results ^or typing 
tastes. Sup^.rvisory appraisals sliould also be expected to be more accurate for 
familiar tasks on which t^erformance has been observed fre^ently, as shown by 
the i^esearch of MacLane (1977) described earlier.^ The consistency. of the ap- 
praised individuals* l)ehavior will also impact upon appraisal accuracy; such 
accuracy should be higher vith tasks for which behavior is consistent rather 
ttian highly variable. Consistent experience with tasks will not facilitate 



26 



appraisals unless raters can temember it. |r^^l of relevant previous experi- 
ence should be facilitated before appraisals jarci ^iven. This can be done by 
asking raters to review their behavior in pr^iol^^ relevant situations or by 
providing taem with memory cues, such as desqriptglons of the tasks being ad- 
dressed and situations in which they are coiibonly per^rmeds ' 

Motivation Is another factor which can influence the accuracy of subjective 
appraisals. The need for accuracy should be ^strongly emphasized in instruc- 
tions provided before ratings are collected./ The accuracy of at least a 
•selected sample of subjective ratings should be checked against objective cri- 
teria, such as 'Performance test results. Raters should be informed that such 
a check will be conducted, in order to maximize their desire for accuracy. 

In summary, while che degree of accuracy ,of subjective appraisals is yet 
unknown, it canjoe maximized through the application pf methodologically sound 
data collection\ppro aches'. Some of these techniques were described above and 
will be summarized as recommendations in the next section. Further research 
is needed to determine the exact relationship of these approaches to the accu*- 
racy of subjective appraisals. Using these techniques to collect subjective 
appraisals In conjunctiou. with the collection of more objective comparative 
data will provide many o^ the data that are needed. 

CONCLU§IO^FS /RECOMMENBATIOKS 

The data reviewed in this 'paper lead to at least thre^ major conclusions . 
with respect to the accuracy of subjective appraisai^l The first' of these is 
that adequate data are not yet available to determine either the absolute ac- 
curacy of subjective appraisals or the relative accuracy of different appraisal 
sources. The biggest problem here is the general lack of objective criteria 
to which subjective data can be compared. In many studies, subjective raCiitgs 
have been compared t6 other ratings or to data which only ap]^roximate objective 
criterion data», stich as written test results. When ratings from different 
sources have been compared to each other, results show that self-appraisals 
differ somewhat from peer and supervisory appraisals. But ratings ha^e not in 
general been compared to sufficiently objective criteria to allow definitive 
statements on their accuracy or predictive validity. ' Research is badLy needed 
which allows comparison of subjective ratings or predictions to relatively ,ob- 
jec^ve sets of crl,terion^data, such as results of "hands*on" performance tests. 

The second major conclusion is that the limited research which has directly 
addressed the accyracy of subjective appraisals has in general not found it to 
be h^gh* Results for appraisals of the performance proficiency, frequency, 
difficulty, and critlcality of specific tasks all support this conclusion. 
Various types of psychometri^c errors h^ve commonly been found in subjective 
appr^i^sals. The general lack of interrater reliability limits the amount of 
accuracy or validity that can be expected In subjective appraisals. People 
have difficulty distinguishing among the various ^aspects or dimensions of an 
ap'J^ralsal situatloiljt^ which often leads to halo effects. A leniency error or 
positive bias has frequently been found, esjiecially In self-appraisals. Before 
conclusions are drawn based upon subjective appraisals in any situation, the 
accuracy; of the data should be checked. This check should involve a comparison 



-A 



27 



.36' 



of subjective data with Independently gathered data that are as objective In 
nature as possible. 

The final conclusion Is th^t while the available data relating to the ac- 
curacy of subjective appraisals are not definitive, there are ways to increase 
this accuracy* Subj ^ctive , appraisals will always be used because of the ease 
and economy with which they can be collected; Further research is needed, but 
available research results suggest several general ways in' which the accuracy 
of subjective appraisals ,can be increased* These are sunnnarized below, and 
their application to the collection and use of subjective appraisals is 
strongly recommended* , * 

1* Integrate mutua^lly^^su^ortive subjective appraisal methods within a 
feedback system* Since no appraisal method is complete ai^ sufficient in and 
of itself, methods should be used to compleifkent each other* Surveys can be 
used to obtiin a general overview of the situation, interviews can be used to ^ 
obtain more in-depth detail- on specific problaas, and observations and per- 
formance tests^ can be used as acfxx^"^ checks* ' * , 

2* Ensure that- question developers and subjective appraisers have a comi^n 
base of understanding* These groups should share a common understanding of 
task elements, successful task completion, appropriate standards, and ratlQg 
dime^sionffT- If any of these factors are unclear, misleading data may result* 

, i\ Qesign questions to maximize accuracy* Make the situation and behavior 
beitg^^dressed as explicit *as ^ssible, and specifically state the action be- 
ing addressed and the target of that action* With a small number of tasks, 
consider using a paired-^comparison rather than an absolute rating technique* 
With a larger number of tasks, consider asking raters to compare their own 
strengths and weaknesses, rather than to compare their abilities to those of 
others* Also, consider asking appraisers to rate their maximal rather than 
their typical behavior* " 

4* Hake rating scales as explicit as possible* Phraifie z^ing scales In 
terms of explicit observable measures of performance, rathir than in vague, 
general terms such as "average," "below average," etc* Describe each '^rating 
point in terms of the behavior that it represents^ Consider asking raters to 
provide specific examples of experiences which support their ratings* 

3* Be sure the raters have had experience with the tasks rated* Give 
raters the opticm of indicating that they have not had experience with any 
given task, and thus cannot provide a rating for it. Be sure that supervisors 
have had ample opportunity to observe task performance by the people they are 
rating* ' ^ / - 

6t Train raters before they provide subjective appraisals* This training* 
should include experience with the ra^^ing scales to be used, ^ discussion of 
common type? of psychometric errors (halo and leniency effects), and a dis- 
cussion of the dimensions of the situstion being evaluated. Provide refresher 
training to raters If a large number of ratings are being collected over a long 
period of time. 



37 



7. Facilitate raters' j^ecall of relevant experiences. Ask raters to re- 
view their previous experiences, provide them with thorough descriptions of 
the tasks and situations being rated, and provide .any other'memory cues which 
aid recall. 

Z» Make certain that appraisers have the cognitive capacity and motivation 
to provide accurate ratings. Be sure that they can understand the questions 
as^ed and the use of .tating scales. £x{>laln the need for accurate rating data' 
during instructions. If the accuracy of the subjective ratings will be checked, 
let the raters know this. 



/ 




29 ^8 



REFERENCE NOTES 



It West, Arthur L, Jr, , MG, USA .(Ret,), Personal communication, March 1971, 

2, Hiller, J, H, Personal conuaurUcatloif, December 1981, 

3, Goldberg, S, L, ^Personal communication, December 1981, 

4, Hlller, J, H, 'JPersonal communlcatl^on, February 1982. / 



r 



r. 



I 



REFERENCES • , 

— : : ^ — : — — r 

Ajsen, I. & Flshbelti, M. Attitude-behavior relations*, A .theoretical analysis 
and revlev of empirical research. Psych&loglcal Bulletin, 1977, 84, 
888-918. :. ' 

ASsh, R. A^ Self-assessments of five types of typing ability". Personnel Psy- 
chology , 1980, 33, 273-282. 

Bergman, B. A. ^ Slegel, A. I. Training evaluation and gtudenf achievement, 
measurement! A review of the literature (Technical Report 72-3) Lowry 
' Air ^orce Base* CO: Air Force Human Resources Laboratory, January, 1972. 

Bemard^n, H. J., Beatty, R» W., & Jensen, W. The ne^ uniform guidelines on 
emplojfge selection procedures In the context of university personnel de- - 
;clslons. Personnel Psychology ,' 1^80, 33^, 301-316.* 

Bemardln, Hi J^. & Walter, C, S. Effect of rater^trainlng and dlary^keeping 

cm psychometric error in ratings. Journal of Applied Psychology , 1977, 6i2, 
• 64-69. ^ ■ , * 

Bumslde, B. L. Field performance feedback ^ A problem review (Research 
Report 1323). Alexandria, VA: US ^^rmy Research Institute for the be- 
havioral and Social Sciences-, August 1981. (ADA 134 388) 

Carver, R. P. Inj>llcatlons of a new technique for measuring the understanding 
gained from readlhg for non^resldgntlal programs . Washington," D. C: 
American Institutes for Research, 1972. {ERIC Document Reproduction Serv- 
ice No. ED 064.383 .) , ' ■ ; 

Casclo, fl. F. Applied psychology in personnel management . Reston^ VA: Reston 
Publishing Company, Inc., 1978. 4 ] 

Cohen, P. A. Student ratings of instruction and student^ achievement: A 
analysis of multisection validity studies. Review of Educational Re 
, 1981, SI, 281-309. 

DeMlsi, A., S. & Shaw, J. B., Investigation of the uses of self-reports o^ abili- 
ties. Journal of Applied Psychology , 1977, 62, 641-644. 

Downey, R. G. & Duffy, P. J. ' Review of peer evaluation research (Technldal^ 
Paper 342). Alexandria, VA: US Army Research Instltii^te for the Behavioral 
and Social Sciences, October 1978. (ADA 061 780) 

Dyer, F. K. & Hllllgoss, R. E. Using an assessfaent center to predlcTtT fl^Xd 
leadership performa.ice of Army officers and HCO's (Technical Paper 372). 
Alexandria, VA: US Army Rese^ch Institute for the Behavioral and Social 
Sciences, May 1979. . ' ^ 




31 ^Q. .1^ 



V:- 



* ^ ' * .-1 > 

Flavell, J. H. Developmental ^stjtidles pf mediated' memory. In ►SrTI?^' Ree$fe,^ 
h\ P. Llpaltt (Eds.), /^4vances in child .develppmeflt and behavlpr ^oli[/i>) > 
, New York: Academic Pres3, 1970. ^ ' ^ ^ " .-r.'^- ^ 

Frlck, T. & Semmel, M. I. Observer agreement and reliabilities of classroom 
observational measures. Review of Educat^ional Researcli , X97g^\ 4 8 , ;X57^184. 

Gardiner, J. M. ^ Klee, H. Meiqory for remembered events: An assesajnent.of 
output monitoring itf free ^recall. Journal of Verbal Leatn^^ng and Verbal 
Behavior , 1976, 05, 227-233. v . r . , - - ; 

Gilbert, A. C. F. & Downey, R. G. Validity of peer yatings obtained daffitg_ , 
.N Ranger training (Technical Paper^ ^44) . Alexai^dria, VAt US Army "R^sBa^h^J^ 
Institute for the^ Behavioral and Social 'Sciences, October 1978.. (AI)A P6r^6) 

''Gorsu^h, L. , Henlghan, R..P., h Barnard, C. Locus^cf control:' ^ example 

of'dangers in using children's scales with chll<5ren^ Child Developttfj&nt , ; ^ 
1972, 43,379-590. ^ / ' 

Hall, B. R., Denton, .C. F.^ tt Zajkowakl, H. M. A compayat^lve Assessmeitt of 
three methods of collecting tyalnjLng feedback information (TAEGj^pprt 
Ho. 64). Orlando, FL: Training Analysis and Evaluation Group, December 

Harris, J. H. » Osbom, W. C, % Bol^o^^ici, J. A. A paired-comparison approach 
for estiiflatlng task critlcalit^. In Osbom, 'W. C., Ford, J, P., dampbell, 
■ C. H., Campbell, R. C. ^ Harris; J. H. , > §0ldovlci; J. A. Military i'i^- ^ 
Ing:^ Knowledge and skills (i^fofessional Paper >%-78j^ Alexandria, VA: 
Human Resources ^lesearch Organlzatlo^i^ February 1378.' 

Harris, J. H.^ Campbell^ C. H., & Osbom, W. C Aiy att'em^i^o identify indj- 
^ catorg of competence on mechanical maintenaofte jta^k^ (Final Report 79-1). 
Alexaddrik, *VA: Human Resources Research Organization, January 197$. 

Heymont, I. What Is the Army getting for Its training dollar?. Army , VL977 
(June), 34-38. r . ^ 

Hlller, J. H. iLeaf|[ij^ from prose text: Effe^s of readability level. In- . 
serted question dffflculyr, and indi^yldual differences.. Journal of Educa- 
tional ;Psycho logy, 1974, 66, 202-211. w - 

Hlller, J. H. A methodology for estimating the cost-ef f e^lveness of alterna- 
^ tiye pretesting procedures (Technical Report 502" ). AJexandrla, VA: ^ 
Army Research Institute for the Behavioral and Socfal Scienq^, Novei^er. 
1980. (ADA 115 877) . ' ' ^ ^ " ^ ■ ♦ ' 

Hol^^bach^ R.^L. Ra'ter bias in performance ratings: .Superlot, sel^-, and ^eer, 
' Wings, journal of Applied Psychology, 1978, 63^, 579-588-. 

Hook^ C. M. & Rosenshine, B. V. Accuracy of tetfcher reports ,of their classroom 
behavior. Review of Educational Research, 1979, 49, 1-12. . - / * 



/ 



Johnson, C. A* Tolcunaga, H. T. /'fiT^llLer, J. H. Validation of;a job analysis 
quest lonfiaire against Intensive obss^vatlon . Paper presented at the Mlllr 
tary Testing Association Conference/ Toronto, October 1980. 

Kahneman, D. '& Tversky, A. On the psychology of predlctlpn. Psychological 
Review , 1973, £0, asy^-aSl. 

Kane, J. S. S Lawler, E/ E. ' Meth(i3s of'peer.assessinent. Psychological BulJLe" 
tin, 1978, 83, ^335"386. * _ ^ ' . ' 

Kroll, N. £. A. £t Kelllcutt, H. K., Short-term recall as a function of covert 
rehearsal and o^ Intervening task. Journal of Verbal Learning and Verbal 
Behavior , 1972^ 11, 196-204. , . . - 

Lachman, J. L. ,yLachi]ian, R. , & Xhronesbery, C. Metainemory through the adult 
life span. Developmental^ Psychology , 1979, 13 / 343-331. 



Lamnleln, S. E. & BormaE^ W. Peer rating research: Annotated bibliography 
(Technical Repbrt 79"93. Brooks Air Force Base, TX: Air Force HLEman Re- 
sources Laboratory, June 1979. 

LleberWi, D. A. Behaviorism. and the mind: A (limited) , call for a return to 
Introspection. American Psychologist: , 1979, 34, 319^^1|l|0. 

Levlne, E. L. Introductory remarks for the. symposium "Organizational appllca- - 
, tlons of self "appraisal and self^-asaessment: Another loot^ '." Personnel 
' t>sychology , 1980, 33, 239-262. . ■ 

^ r ' • .~ 

Levlne, £. L.,^or7, ,A., & Ash, R. A. Self-assessment In personnel selection. 
Journal of ^flgplled Psychology , 1977, 62, 428-433. , 

HacLane, C.^^^ Promotion evaluation for Inter-^rganlzatlonal referral: A be- 
havioral expectation apprpach . Paper presented at the Military Testing. 
Association Conference, San Antonio,, October 1977. 



Medlin, S. M. S^TJiompson, P. Evaluator rating of unit performance In field 
exercises: A multidimensional scaling analysis (Teclmlcal Report 438). 

*la, V*.: 'US Army Research institute for the Behavioral and Social 
i) Aprjjfl 1980.* 



Alexandria. 
Sciences I 



(ADA 089 264) 



Meyer, H. H. Self-appraisal of job performance. Pers onnelrfsy cho logy , 1980, 
33, 291-296. " . ' 

Mitchell, Sf K. Interobserv^ agreement, 'reliability, and generallzablllty ,of 
data collected In observational studies. Psychological Bulletin , 1979, 86_, 
.376-390. . 

Moreland, R. , Miller, J., & J.aMcka, F. Academic achleveaent and self-evaluation 
of acadein:Cc performance. Journal of Educational Psychology , 1981, 73 , 
333-344. • * 



33 



42 



Nlsbe^t, R. ^ Wilson, T. D. Telling more than we cap^knowl' Verbal reports 
on mental processes. Psychological ReVlew , 1977, BJff ft3l-259*. ' ' 

Pourchot, L. ^ L^nnlng, F. The self-concept ais pr^^ictor of scores on the 
, Pourchot Mechanical Manipulation Test. Joum^t of the Assoplatlon for the 
Study of PercePtlQPrr l979. 14, 6-11. . / ' ^ ♦ ^ 

V 

Prlmoff, S.' The use of self '-assessments In examining (Professional Series 
*79'"1). Washington, DC: Office of Personnel Management, Personnel Research* 
and Development Center, April 1979. 

Primoff, E. S. The use x)f. self-assessments in exam^ing. Personner Psychology , 
1980,M^- 283-290. , ' 



Roblj^on, J. A. ^ Kulp, R. A. Knowledge of prior recall. Journal of Verbal ^ 
iming and Verbal Behavior , 1?70, 2> 84-86. 

Rose, A. M. $ Wheaton, G. R^ Performance effectiveness in combat job special" 
ties (Final iJeport 5173-66200). Washington, D. C: American Institutes 
for Research, May 1978. 



Ryan'-Jones, D. L. A comparison of expert ratings of task difficulty with an 

Independent criterion (Technical Report 418). Alexandria, VA: US Army ; 

Research Institute for the Behavioral and Social Sciences, No\^ber 197J9. 
(AD^ 082 016) . ^ ' 

Schendel, J. D. ^ Uagman^ J. D. On sustaining procedural skills over a pro*- 
longed 'ret^tlon lnterv*al. Journal of Applied Psychology , in press. 

Shaug)m4ssy, J. J. Confidence-judgment accuracy as a*predlctor of test per- 
formance. Journal of Research in Personality , 1979, 13 , 505-514. 

Shavelsori, R. ^ Dempsey-Atwood, N. Cenerallzablllty of measures of teaching \ 
behavior. Review'of Educational Research , 1976, 46, 553*611. 

Shields, 3. L., Goldberg, S. L., ^ Dressel, J. D. Retention of 'basic soldier- 
ing skills (Research Report 1225). Alexandria, VA: US Army Research In- 
stltute for the Behavioral and Social Sciences, ^September 1979. ^*J)A 075 412) 

Shrauger, J. S. ^ 'Osberg, T. M. The relative accuracy bf self ^predictions an^ 
judgments by others in psychological assessment. Psychological Bulletin , 
1981, 90, 322-351. ' j 

Shvem, U. Field evaluation of. the combat commander'^ guide to aerial sur- 
veillance and recdnnalssance^esources (technical Paper 380). Alexandria, 
VA: US Army Research Institute for the Behavioral and Social Sciences, 
July 1979. (ADA 075 422) \^ i 

Smith, B. R. ^ Miller, F. D. limits an perception of cognitive processes: 
A reply to Nlsbett an^d Wll$6n.. -- PsycA^loRlcal Revlew/ l978. 85.^355-362. 

. ■/ ' ^ ' ■ . * ' ' ' 



^ • 34 



Thornton, G. ^C. Psychometric properties of self-appraisals of job petfonnance. 
Personnel Psychology , ""1980, 33, 263-272. s 

Tulvj!t^, E. Episodic and 'semantic memory. In £. Tujlvlng and W. Donaldson 
> \Ed3/), Organization of meaory . New York: Academic Press, 1972. 

Turpey, J. R; a Cohen, S. jL. Per'celved work effort as time devoted to an 

activity (Technical Patofer 337), Alexandria, VA; US Army Research Instl- 
. tuEe for the BehavlorafL and Social Sciences, iseptember 1978. (ADA 062 411) 



Us Xxmy Field Manual 101-5^, Staff officers field manual; Staff orgarilzatlorv , 
and procedure . ,Washingtoni] D. C: Headquarters^ Department of the Army, 
July 1&72. ' . * 

US Aniiy Regulation 220*1, Unit statuyreporting , Washington, D. C: Head- 
quarters, Department df the Army, June 1978. 

* ■ ^ ' * 

US Abny Training and Doc^ine Command Pamphlet 350-30, Interservlce procedures 
*K ' for Instttictlonal^ systems develppment . Fort Bennlng, GA; Combat Arms 
^Training Board, An^st 197-5. 



^m^^Train^j 



US Ani}#^TrainJ©g an'd Doctrine Command Draft Regulation 350*7, A systems ap* 
1>roacb to training . Fort Monroe,' VAs Headquarters^ US Army Training aiid 
Doctrine Command, '^undated. 



US Army Training ^udy. Fort Belvolr, VA, 1978,* i 

van Rlji^, P, '^ ^jjglf-^asse^fflent for personnel examining! An overview (Personnel 
.Research feport 30*14) . Washington, D, C: Office of Pei;soi]^el Manage- 
nient. Personnel Research and Development Center, June 1980^ 

van t^jn, P. v 'Self-jassessment in personnel selection and placement . Paper pre- 
sented at th^ Military Testing Association Conference, Arlington, VA, 
- October 19S1. " * - ^ ' ^ 

.wills, T. ^. Perceptions/ of clients fay professional helpers. Psychological 
. Bulletin, 1*978, 33, 344^358. 



¥ i 



•35 



.44 I 



02/07/84 



